Sample records for analysis pipeline called

  1. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses.

    PubMed

    Golosova, Olga; Henderson, Ross; Vaskin, Yuriy; Gabrielian, Andrei; Grekhov, German; Nagarajan, Vijayaraj; Oler, Andrew J; Quiñones, Mariam; Hurt, Darrell; Fursov, Mikhail; Huyen, Yentram

    2014-01-01

    The advent of Next Generation Sequencing (NGS) technologies has opened new possibilities for researchers. However, the more biology becomes a data-intensive field, the more biologists have to learn how to process and analyze NGS data with complex computational tools. Even with the availability of common pipeline specifications, it is often a time-consuming and cumbersome task for a bench scientist to install and configure the pipeline tools. We believe that a unified, desktop and biologist-friendly front end to NGS data analysis tools will substantially improve productivity in this field. Here we present NGS pipelines "Variant Calling with SAMtools", "Tuxedo Pipeline for RNA-seq Data Analysis" and "Cistrome Pipeline for ChIP-seq Data Analysis" integrated into the Unipro UGENE desktop toolkit. We describe the available UGENE infrastructure that helps researchers run these pipelines on different datasets, store and investigate the results and re-run the pipelines with the same parameters. These pipeline tools are included in the UGENE NGS package. Individual blocks of these pipelines are also available for expert users to create their own advanced workflows.

  2. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms.

    PubMed

    Puritz, Jonathan B; Hollenbeck, Christopher M; Gold, John R

    2014-01-01

    Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.

  3. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

    PubMed Central

    Hollenbeck, Christopher M.; Gold, John R.

    2014-01-01

    Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com. PMID:24949246

  4. ToTem: a tool for variant calling pipeline optimization.

    PubMed

    Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka

    2018-06-26

    High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at  https://totem.software .

  5. Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes

    PubMed Central

    Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.

    2015-01-01

    Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529

  6. AMES Stereo Pipeline Derived DEM Accuracy Experiment Using LROC-NAC Stereopairs and Weighted Spatial Dependence Simulation for Lunar Site Selection

    NASA Astrophysics Data System (ADS)

    Laura, J. R.; Miller, D.; Paul, M. V.

    2012-03-01

    An accuracy assessment of AMES Stereo Pipeline derived DEMs for lunar site selection using weighted spatial dependence simulation and a call for outside AMES derived DEMs to facilitate a statistical precision analysis.

  7. Systematic comparison of variant calling pipelines using gold standard personal exome variants

    PubMed Central

    Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

    2015-01-01

    The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839

  8. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    PubMed Central

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  9. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    PubMed

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  10. ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis

    PubMed Central

    Li, Guipeng; Chen, Yang; Snyder, Michael P.; Zhang, Michael Q.

    2017-01-01

    ChIA-PET2 is a versatile and flexible pipeline for analyzing different types of ChIA-PET data from raw sequencing reads to chromatin loops. ChIA-PET2 integrates all steps required for ChIA-PET data analysis, including linker trimming, read alignment, duplicate removal, peak calling and chromatin loop calling. It supports different kinds of ChIA-PET data generated from different ChIA-PET protocols and also provides quality controls for different steps of ChIA-PET analysis. In addition, ChIA-PET2 can use phased genotype data to call allele-specific chromatin interactions. We applied ChIA-PET2 to different ChIA-PET datasets, demonstrating its significantly improved performance as well as its ability to easily process ChIA-PET raw data. ChIA-PET2 is available at https://github.com/GuipengLi/ChIA-PET2. PMID:27625391

  11. Identification and validation of loss of function variants in clinical contexts.

    PubMed

    Lescai, Francesco; Marasco, Elena; Bacchelli, Chiara; Stanier, Philip; Mantovani, Vilma; Beales, Philip

    2014-01-01

    The choice of an appropriate variant calling pipeline for exome sequencing data is becoming increasingly more important in translational medicine projects and clinical contexts. Within GOSgene, which facilitates genetic analysis as part of a joint effort of the University College London and the Great Ormond Street Hospital, we aimed to optimize a variant calling pipeline suitable for our clinical context. We implemented the GATK/Queue framework and evaluated the performance of its two callers: the classical UnifiedGenotyper and the new variant discovery tool HaplotypeCaller. We performed an experimental validation of the loss-of-function (LoF) variants called by the two methods using Sequenom technology. UnifiedGenotyper showed a total validation rate of 97.6% for LoF single-nucleotide polymorphisms (SNPs) and 92.0% for insertions or deletions (INDELs), whereas HaplotypeCaller was 91.7% for SNPs and 55.9% for INDELs. We confirm that GATK/Queue is a reliable pipeline in translational medicine and clinical context. We conclude that in our working environment, UnifiedGenotyper is the caller of choice, being an accurate method, with a high validation rate of error-prone calls like LoF variants. We finally highlight the importance of experimental validation, especially for INDELs, as part of a standard pipeline in clinical environments.

  12. NGSANE: a lightweight production informatics framework for high-throughput data analysis.

    PubMed

    Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C

    2014-05-15

    The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.

  13. Identification of missing variants by combining multiple analytic pipelines.

    PubMed

    Ren, Yingxue; Reddy, Joseph S; Pottier, Cyril; Sarangi, Vivekananda; Tian, Shulan; Sinnwell, Jason P; McDonnell, Shannon K; Biernacka, Joanna M; Carrasquillo, Minerva M; Ross, Owen A; Ertekin-Taner, Nilüfer; Rademakers, Rosa; Hudson, Matthew; Mainzer, Liudmila Sergeevna; Asmann, Yan W

    2018-04-16

    After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size.

  14. 49 CFR 198.39 - Qualifications for operation of one-call notification system.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 3 2010-10-01 2010-10-01 false Qualifications for operation of one-call...) PIPELINE SAFETY REGULATIONS FOR GRANTS TO AID STATE PIPELINE SAFETY PROGRAMS Adoption of One-Call Damage Prevention Program § 198.39 Qualifications for operation of one-call notification system. A one-call...

  15. Uplifting behavior of shallow buried pipe in liquefiable soil by dynamic centrifuge test.

    PubMed

    Huang, Bo; Liu, Jingwen; Lin, Peng; Ling, Daosheng

    2014-01-01

    Underground pipelines are widely applied in the so-called lifeline engineerings. It shows according to seismic surveys that the damage from soil liquefaction to underground pipelines was the most serious, whose failures were mainly in the form of pipeline uplifting. In the present study, dynamic centrifuge model tests were conducted to study the uplifting behaviors of shallow-buried pipeline subjected to seismic vibration in liquefied sites. The uplifting mechanism was discussed through the responses of the pore water pressure and earth pressure around the pipeline. Additionally, the analysis of force, which the pipeline was subjected to before and during vibration, was introduced and proved to be reasonable by the comparison of the measured and the calculated results. The uplifting behavior of pipe is the combination effects of multiple forces, and is highly dependent on the excess pore pressure.

  16. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

    PubMed

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos

    2017-08-01

    Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.

  17. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

    PubMed Central

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis

    2017-01-01

    Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616

  18. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  19. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics.

    PubMed

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.

  20. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics

    PubMed Central

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129

  1. Uplifting Behavior of Shallow Buried Pipe in Liquefiable Soil by Dynamic Centrifuge Test

    PubMed Central

    Liu, Jingwen; Ling, Daosheng

    2014-01-01

    Underground pipelines are widely applied in the so-called lifeline engineerings. It shows according to seismic surveys that the damage from soil liquefaction to underground pipelines was the most serious, whose failures were mainly in the form of pipeline uplifting. In the present study, dynamic centrifuge model tests were conducted to study the uplifting behaviors of shallow-buried pipeline subjected to seismic vibration in liquefied sites. The uplifting mechanism was discussed through the responses of the pore water pressure and earth pressure around the pipeline. Additionally, the analysis of force, which the pipeline was subjected to before and during vibration, was introduced and proved to be reasonable by the comparison of the measured and the calculated results. The uplifting behavior of pipe is the combination effects of multiple forces, and is highly dependent on the excess pore pressure. PMID:25121140

  2. 75 FR 69428 - Enbridge Pipelines (North Texas) L.P.; Notice of Baseline Filing

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-12

    ... Pipelines (North Texas) L.P.; Notice of Baseline Filing November 4, 2010. Take notice that on November 3, 2010, Enbridge Pipelines (North Texas) L.P. submitted a revised baseline filing of its Statement of... , or call (866) 208-3676 (toll free). For TTY, call (202) 502-8659. Comment Date: 5 p.m. Eastern Time...

  3. 75 FR 57748 - Combined Notice of Filings No. 2

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-22

    ...: Cameron Interstate Pipeline, LLC. Description: Cameron Interstate Pipeline, LLC submits an eTariff XML...-mail FERCOnlineSupport@ferc.gov , or call (866) 208-3676 (toll free). For TTY, call (202) 502-8659...

  4. esATAC: An Easy-to-use Systematic pipeline for ATAC-seq data analysis.

    PubMed

    Wei, Zheng; Zhang, Wei; Fang, Huan; Li, Yanda; Wang, Xiaowo

    2018-03-07

    ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present "esATAC", a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data analysis. It covers essential steps for full analyzing procedure, including raw data processing, quality control and downstream statistical analysis such as peak calling, enrichment analysis and transcription factor footprinting. esATAC supports one command line execution for preset pipelines, and provides flexible interfaces for building customized pipelines. esATAC package is open source under the GPL-3.0 license. It is implemented in R and C ++. Source code and binaries for Linux, MAC OS X and Windows are available through Bioconductor https://www.bioconductor.org/packages/release/bioc/html/esATAC.html). xwwang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  5. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

    PubMed

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.

  6. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows

    PubMed Central

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P.; Zijdenbos, Alex P.; Evans, Alan C.

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources. PMID:22493575

  7. GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics.

    PubMed

    Ricke, Darrell O; Shcherbina, Anna; Michaleas, Adam; Fremont-Smith, Philip

    2018-04-16

    High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus-identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  8. 78 FR 10689 - Pipeline Safety: Public Forum State One-Call Exemptions

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-14

    ... DEPARTMENT OF TRANSPORTATION Pipeline and Hazardous Materials Safety Administration [Docket No... Safety, Pipeline and Hazardous Materials Safety Administration, DOT. ACTION: Notice; public forum. SUMMARY: The Pipeline and Hazardous Materials Safety Administration will sponsor a public forum on state...

  9. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    PubMed Central

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  10. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  11. OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data

    PubMed Central

    Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y. Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences. PMID:24824529

  12. OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data.

    PubMed

    Zhu, Pengyuan; He, Lingyu; Li, Yaqiao; Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology's Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.

  13. 76 FR 44985 - Pipeline Safety: Potential for Damage to Pipeline Facilities Caused by Flooding

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-27

    ... Bulletin. SUMMARY: PHMSA is issuing this advisory bulletin to all owners and operators of gas and hazardous..., Nevada, Oregon, Utah, Washington, and Wyoming, call 720-963-3160. Intrastate pipeline operators should... failure near [[Page 44986

  14. 78 FR 44558 - Stingray Pipeline Company, L.L.C.; Notice of Request Under Blanket Authorization

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-24

    ... Pipeline Company, L.L.C.; Notice of Request Under Blanket Authorization Take notice that on July 3, 2013, Stingray Pipeline Company, L.L.C. (Stingray), 1100 Louisiana Street, Houston, Texas 77002, filed in Docket... Compliance, Stingray Pipeline Company, L.L.C., 1100 Louisiana, Suite 3300, Houston, Texas 77002, or call (832...

  15. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  16. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  17. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  18. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

    PubMed

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

    2015-06-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Reproducibility of neuroimaging analyses across operating systems

    PubMed Central

    Glatard, Tristan; Lewis, Lindsay B.; Ferreira da Silva, Rafael; Adalat, Reza; Beck, Natacha; Lepage, Claude; Rioux, Pierre; Rousseau, Marc-Etienne; Sherif, Tarek; Deelman, Ewa; Khalili-Mahani, Najmeh; Evans, Alan C.

    2015-01-01

    Neuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed. We quantify these differences for brain tissue classification, fMRI analysis, and cortical thickness (CT) extraction, using three of the main neuroimaging packages (FSL, Freesurfer and CIVET) and different versions of GNU/Linux. We also identify some causes of these differences using library and system call interception. We find that these packages use mathematical functions based on single-precision floating-point arithmetic whose implementations in operating systems continue to evolve. While these differences have little or no impact on simple analysis pipelines such as brain extraction and cortical tissue classification, their accumulation creates important differences in longer pipelines such as subcortical tissue classification, fMRI analysis, and cortical thickness extraction. With FSL, most Dice coefficients between subcortical classifications obtained on different operating systems remain above 0.9, but values as low as 0.59 are observed. Independent component analyses (ICA) of fMRI data differ between operating systems in one third of the tested subjects, due to differences in motion correction. With Freesurfer and CIVET, in some brain regions we find an effect of build or operating system on cortical thickness. A first step to correct these reproducibility issues would be to use more precise representations of floating-point numbers in the critical sections of the pipelines. The numerical stability of pipelines should also be reviewed. PMID:25964757

  20. Reproducibility of neuroimaging analyses across operating systems.

    PubMed

    Glatard, Tristan; Lewis, Lindsay B; Ferreira da Silva, Rafael; Adalat, Reza; Beck, Natacha; Lepage, Claude; Rioux, Pierre; Rousseau, Marc-Etienne; Sherif, Tarek; Deelman, Ewa; Khalili-Mahani, Najmeh; Evans, Alan C

    2015-01-01

    Neuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed. We quantify these differences for brain tissue classification, fMRI analysis, and cortical thickness (CT) extraction, using three of the main neuroimaging packages (FSL, Freesurfer and CIVET) and different versions of GNU/Linux. We also identify some causes of these differences using library and system call interception. We find that these packages use mathematical functions based on single-precision floating-point arithmetic whose implementations in operating systems continue to evolve. While these differences have little or no impact on simple analysis pipelines such as brain extraction and cortical tissue classification, their accumulation creates important differences in longer pipelines such as subcortical tissue classification, fMRI analysis, and cortical thickness extraction. With FSL, most Dice coefficients between subcortical classifications obtained on different operating systems remain above 0.9, but values as low as 0.59 are observed. Independent component analyses (ICA) of fMRI data differ between operating systems in one third of the tested subjects, due to differences in motion correction. With Freesurfer and CIVET, in some brain regions we find an effect of build or operating system on cortical thickness. A first step to correct these reproducibility issues would be to use more precise representations of floating-point numbers in the critical sections of the pipelines. The numerical stability of pipelines should also be reviewed.

  1. Heterogeneous Optimization Framework: Reproducible Preprocessing of Multi-Spectral Clinical MRI for Neuro-Oncology Imaging Research.

    PubMed

    Milchenko, Mikhail; Snyder, Abraham Z; LaMontagne, Pamela; Shimony, Joshua S; Benzinger, Tammie L; Fouke, Sarah Jost; Marcus, Daniel S

    2016-07-01

    Neuroimaging research often relies on clinically acquired magnetic resonance imaging (MRI) datasets that can originate from multiple institutions. Such datasets are characterized by high heterogeneity of modalities and variability of sequence parameters. This heterogeneity complicates the automation of image processing tasks such as spatial co-registration and physiological or functional image analysis. Given this heterogeneity, conventional processing workflows developed for research purposes are not optimal for clinical data. In this work, we describe an approach called Heterogeneous Optimization Framework (HOF) for developing image analysis pipelines that can handle the high degree of clinical data non-uniformity. HOF provides a set of guidelines for configuration, algorithm development, deployment, interpretation of results and quality control for such pipelines. At each step, we illustrate the HOF approach using the implementation of an automated pipeline for Multimodal Glioma Analysis (MGA) as an example. The MGA pipeline computes tissue diffusion characteristics of diffusion tensor imaging (DTI) acquisitions, hemodynamic characteristics using a perfusion model of susceptibility contrast (DSC) MRI, and spatial cross-modal co-registration of available anatomical, physiological and derived patient images. Developing MGA within HOF enabled the processing of neuro-oncology MR imaging studies to be fully automated. MGA has been successfully used to analyze over 160 clinical tumor studies to date within several research projects. Introduction of the MGA pipeline improved image processing throughput and, most importantly, effectively produced co-registered datasets that were suitable for advanced analysis despite high heterogeneity in acquisition protocols.

  2. DPARSF: A MATLAB Toolbox for "Pipeline" Data Analysis of Resting-State fMRI.

    PubMed

    Chao-Gan, Yan; Yu-Feng, Zang

    2010-01-01

    Resting-state functional magnetic resonance imaging (fMRI) has attracted more and more attention because of its effectiveness, simplicity and non-invasiveness in exploration of the intrinsic functional architecture of the human brain. However, user-friendly toolbox for "pipeline" data analysis of resting-state fMRI is still lacking. Based on some functions in Statistical Parametric Mapping (SPM) and Resting-State fMRI Data Analysis Toolkit (REST), we have developed a MATLAB toolbox called Data Processing Assistant for Resting-State fMRI (DPARSF) for "pipeline" data analysis of resting-state fMRI. After the user arranges the Digital Imaging and Communications in Medicine (DICOM) files and click a few buttons to set parameters, DPARSF will then give all the preprocessed (slice timing, realign, normalize, smooth) data and results for functional connectivity, regional homogeneity, amplitude of low-frequency fluctuation (ALFF), and fractional ALFF. DPARSF can also create a report for excluding subjects with excessive head motion and generate a set of pictures for easily checking the effect of normalization. In addition, users can also use DPARSF to extract time courses from regions of interest.

  3. High-throughput bioinformatics with the Cyrille2 pipeline system

    PubMed Central

    Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ

    2008-01-01

    Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742

  4. A-Track: A new approach for detection of moving objects in FITS images

    NASA Astrophysics Data System (ADS)

    Atay, T.; Kaplan, M.; Kilic, Y.; Karapinar, N.

    2016-10-01

    We have developed a fast, open-source, cross-platform pipeline, called A-Track, for detecting the moving objects (asteroids and comets) in sequential telescope images in FITS format. The pipeline is coded in Python 3. The moving objects are detected using a modified line detection algorithm, called MILD. We tested the pipeline on astronomical data acquired by an SI-1100 CCD with a 1-meter telescope. We found that A-Track performs very well in terms of detection efficiency, stability, and processing time. The code is hosted on GitHub under the GNU GPL v3 license.

  5. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference

    PubMed Central

    2015-01-01

    High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves. PMID:26539496

  6. FliPer: checking the reliability of global seismic parameters from automatic pipelines

    NASA Astrophysics Data System (ADS)

    Bugnet, L.; García, R. A.; Davies, G. R.; Mathur, S.; Corsaro, E.

    2017-12-01

    Our understanding of stars through asteroseismic data analysis is limited by our ability to take advantage of the huge amount of observed stars provided by space missions such as CoRoT, \\keplerp, \\ktop, and soon TESS and PLATO. Global seismic pipelines provide global stellar parameters such as mass and radius using the mean seismic parameters, as well as the effective temperature. These pipelines are commonly used automatically on thousands of stars observed by K2 for 3 months (and soon TESS for at least ˜ 1 month). However, pipelines are not immune from misidentifying noise peaks and stellar oscillations. Therefore, new validation techniques are required to assess the quality of these results. We present a new metric called FliPer (Flicker in Power), which takes into account the average variability at all measured time scales. The proper calibration of \\powvar enables us to obtain good estimations of global stellar parameters such as surface gravity that are robust against the influence of noise peaks and hence are an excellent way to find faults in asteroseismic pipelines.

  7. 75 FR 24938 - Questar Pipeline Company; Notice of Intent to Prepare an Environmental Assessment for the Planned...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-06

    .../receivers; \\2\\ \\2\\ A ``pig'' is a tool that is inserted into and moves through the pipeline, and is used for...://www.ferc.gov using the link called ``eLibrary'' or from the Commission's Public Reference Room, 888... feature, which is located at http://www.ferc.gov under the link called ``Documents and Filings''. A Quick...

  8. JGI Plant Genomics Gene Annotation Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less

  9. A survey of the sorghum transcriptome using single-molecule long reads

    DOE PAGES

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

    2016-06-24

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less

  10. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  11. A survey of the sorghum transcriptome using single-molecule long reads

    PubMed Central

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

    2016-01-01

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290

  12. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    PubMed Central

    Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

    2014-01-01

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726

  13. SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

    PubMed

    Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

    2015-09-18

    Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

  14. Crack detection and leakage monitoring on reinforced concrete pipe

    NASA Astrophysics Data System (ADS)

    Feng, Qian; Kong, Qingzhao; Huo, Linsheng; Song, Gangbing

    2015-11-01

    Reinforced concrete underground pipelines are some of the most widely used types of structures in water transportation systems. Cracks and leakage are the leading causes of pipeline structural failures which directly results in economic losses and environmental hazards. In this paper, the authors propose a piezoceramic based active sensing approach to detect the cracks and the further leakage of concrete pipelines. Due to the piezoelectric properties, piezoceramic material can be utilized as both the actuator and the sensor in the active sensing approach. The piezoceramic patch, which is sandwiched between protective materials called ‘smart aggregates,’ can be safely embedded into concrete structures. Circumferential and axial cracks were investigated. A wavelet packet-based energy analysis was developed to distinguish the type of crack and determine the further leakage based on different stress wave energy attenuation propagated through the cracks.

  15. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

    PubMed Central

    Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  16. MICRA: an automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data.

    PubMed

    Caboche, Ségolène; Even, Gaël; Loywick, Alexandre; Audebert, Christophe; Hot, David

    2017-12-19

    The increase in available sequence data has advanced the field of microbiology; however, making sense of these data without bioinformatics skills is still problematic. We describe MICRA, an automatic pipeline, available as a web interface, for microbial identification and characterization through reads analysis. MICRA uses iterative mapping against reference genomes to identify genes and variations. Additional modules allow prediction of antibiotic susceptibility and resistance and comparing the results of several samples. MICRA is fast, producing few false-positive annotations and variant calls compared to current methods, making it a tool of great interest for fully exploiting sequencing data.

  17. Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophila

    DTIC Science & Technology

    2016-09-15

    generations of mutation accumulation (MA). We applied an existing mutation-calling pipeline and developed a new probabilistic mutation detection approach...noise introduced by mismapped reads. We used both our new method and an existing mutation-calling pipeline (Sung, Tucker, et al. 2012) to analyse the...and larger MA experiments will be required to confidently estimate the mutational spectrum of a species with such a low mutation rate. Materials and

  18. A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection.

    PubMed

    Carey, Michelle; Ramírez, Juan Camilo; Wu, Shuang; Wu, Hulin

    2018-07-01

    A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

  19. 75 FR 49917 - Enbridge Pipelines (North Texas) L.P.; Notice of Baseline Filing

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-16

    ... Pipelines (North Texas) L.P.; Notice of Baseline Filing August 6, 2010. Take notice that on July 29, 2010, Enbridge Pipelines (North Texas) L.P. submitted a revised baseline filing of its Statement of Operating... (toll free). For TTY, call (202) 502-8659. Comment Date: 5 p.m. Eastern Time on Monday, August 16, 2010...

  20. 77 FR 2716 - Questar Pipeline Company; Notice of Application

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-19

    [email protected] , or to Tad M. Taylor, Division Counsel, Questar Pipeline Company, 180 East 100 South, P.O. Box 45360, Salt Lake City, Utah 84145-0360, or by calling (801) 324-5531 (telephone) tad.taylor...

  1. viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors.

    PubMed

    Bhuvaneshwar, Krithika; Song, Lei; Madhavan, Subha; Gusev, Yuriy

    2018-01-01

    An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.

  2. Identification of pathogen genomic variants through an integrated pipeline

    PubMed Central

    2014-01-01

    Background Whole-genome sequencing represents a powerful experimental tool for pathogen research. We present methods for the analysis of small eukaryotic genomes, including a streamlined system (called Platypus) for finding single nucleotide and copy number variants as well as recombination events. Results We have validated our pipeline using four sets of Plasmodium falciparum drug resistant data containing 26 clones from 3D7 and Dd2 background strains, identifying an average of 11 single nucleotide variants per clone. We also identify 8 copy number variants with contributions to resistance, and report for the first time that all analyzed amplification events are in tandem. Conclusions The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We bundle the methods into a freely available package. PMID:24589256

  3. Adaptation of a program for nonlinear finite element analysis to the CDC STAR 100 computer

    NASA Technical Reports Server (NTRS)

    Pifko, A. B.; Ogilvie, P. L.

    1978-01-01

    The conversion of a nonlinear finite element program to the CDC STAR 100 pipeline computer is discussed. The program called DYCAST was developed for the crash simulation of structures. Initial results with the STAR 100 computer indicated that significant gains in computation time are possible for operations on gloval arrays. However, for element level computations that do not lend themselves easily to long vector processing, the STAR 100 was slower than comparable scalar computers. On this basis it is concluded that in order for pipeline computers to impact the economic feasibility of large nonlinear analyses it is absolutely essential that algorithms be devised to improve the efficiency of element level computations.

  4. 49 CFR 198.37 - State one-call damage prevention program.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 3 2010-10-01 2010-10-01 false State one-call damage prevention program. 198.37... REGULATIONS FOR GRANTS TO AID STATE PIPELINE SAFETY PROGRAMS Adoption of One-Call Damage Prevention Program § 198.37 State one-call damage prevention program. A State must adopt a one-call damage prevention...

  5. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    PubMed

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples

    PubMed Central

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti

    2016-01-01

    Objective Currently, there is a disconnect between finding a patient’s relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. Methods and materials The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. Results IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. Conclusion IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine. IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. PMID:27026619

  7. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples.

    PubMed

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti; Robinson, William A; Tan, Aik Choon

    2016-07-01

    Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    PubMed

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.

  9. ALMA Pipeline: Current Status

    NASA Astrophysics Data System (ADS)

    Shinnaga, H.; Humphreys, E.; Indebetouw, R.; Villard, E.; Kern, J.; Davis, L.; Miura, R. E.; Nakazato, T.; Sugimoto, K.; Kosugi, G.; Akiyama, E.; Muders, D.; Wyrowski, F.; Williams, S.; Lightfoot, J.; Kent, B.; Momjian, E.; Hunter, T.; ALMA Pipeline Team

    2015-12-01

    The ALMA Pipeline is the automated data reduction tool that runs on ALMA data. Current version of the ALMA pipeline produces science quality data products for standard interferometric observing modes up to calibration process. The ALMA Pipeline is comprised of (1) heuristics in the form of Python scripts that select the best processing parameters, and (2) contexts that are given for book-keeping purpose of data processes. The ALMA Pipeline produces a "weblog" that showcases detailed plots for users to judge how each step of calibration processes are treated. The ALMA Interferometric Pipeline was conditionally accepted in March 2014 by processing Cycle 0 and Cycle 1 data sets. From Cycle 2, ALMA Pipeline is used for ALMA data reduction and quality assurance for the projects whose observing modes are supported by the ALMA Pipeline. Pipeline tasks are available based on CASA version 4.2.2, and the first public pipeline release called CASA 4.2.2-pipe has been available since October 2014. One can reduce ALMA data both by CASA tasks as well as by pipeline tasks by using CASA version 4.2.2-pipe.

  10. Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

    PubMed

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2017-01-01

    Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.

  11. A-Track: A New Approach for Detection of Moving Objects in FITS Images

    NASA Astrophysics Data System (ADS)

    Kılıç, Yücel; Karapınar, Nurdan; Atay, Tolga; Kaplan, Murat

    2016-07-01

    Small planet and asteroid observations are important for understanding the origin and evolution of the Solar System. In this work, we have developed a fast and robust pipeline, called A-Track, for detecting asteroids and comets in sequential telescope images. The moving objects are detected using a modified line detection algorithm, called ILDA. We have coded the pipeline in Python 3, where we have made use of various scientific modules in Python to process the FITS images. We tested the code on photometrical data taken by an SI-1100 CCD with a 1-meter telescope at TUBITAK National Observatory, Antalya. The pipeline can be used to analyze large data archives or daily sequential data. The code is hosted on GitHub under the GNU GPL v3 license.

  12. 30 CFR 250.198 - Documents incorporated by reference.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... (Identical), Petroleum and natural gas industries—Pipeline transportation systems—Pipeline valves; Product No... GAS AND SULPHUR OPERATIONS IN THE OUTER CONTINENTAL SHELF General References § 250.198 Documents... Administration (NARA). For information on the availability of this material at NARA, call 202-741-6030, or go to...

  13. UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study.

    PubMed

    Qi, Peng; Gimode, Davis; Saha, Dipnarayan; Schröder, Stephan; Chakraborty, Debkanta; Wang, Xuewen; Dida, Mathews M; Malmberg, Russell L; Devos, Katrien M

    2018-06-15

    Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species. UGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species' ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F 3 -derived F 2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements. The paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species without a reference genome, as a case study. The UGbS-Flex modules, which can be run independently, are easily transferable to species with other breeding systems or ploidy levels.

  14. Turning off the School-to-Prison Pipeline

    ERIC Educational Resources Information Center

    Wilson, Harry

    2014-01-01

    The causal link between educational exclusion and criminalization of youth is called the "school-to-prison pipeline." This is a byproduct of "zero tolerance" polices that have been widely discredited by research (APA, 2008; Skiba, 2014). However, these practices are still widespread in the United States and have been exported…

  15. Digital Mapping of Buried Pipelines with a Dual Array System

    DOT National Transportation Integrated Search

    2005-03-01

    The project carried out under this agreement, which was informally called the "Dual Array Project" (the term we will use in this report), was part of the research efforts at the Office of Pipeline Safety at U.S. DOT, and was one of seven contracts aw...

  16. Application of the actor model to large scale NDE data analysis

    NASA Astrophysics Data System (ADS)

    Coughlin, Chris

    2018-03-01

    The Actor model of concurrent computation discretizes a problem into a series of independent units or actors that interact only through the exchange of messages. Without direct coupling between individual components, an Actor-based system is inherently concurrent and fault-tolerant. These traits lend themselves to so-called "Big Data" applications in which the volume of data to analyze requires a distributed multi-system design. For a practical demonstration of the Actor computational model, a system was developed to assist with the automated analysis of Nondestructive Evaluation (NDE) datasets using the open source Myriad Data Reduction Framework. A machine learning model trained to detect damage in two-dimensional slices of C-Scan data was deployed in a streaming data processing pipeline. To demonstrate the flexibility of the Actor model, the pipeline was deployed on a local system and re-deployed as a distributed system without recompiling, reconfiguring, or restarting the running application.

  17. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines.

    PubMed

    Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon; Covington, Kyle R; Kandoth, Cyriac; Stewart, Chip; Hess, Julian; Ma, Singer; Chiotti, Kami E; McLellan, Michael; Sofia, Heidi J; Hutter, Carolyn; Getz, Gad; Wheeler, David; Ding, Li

    2018-03-28

    The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  18. 78 FR 25264 - Southern Star Central Gas Pipeline, Inc.; Notice of Request Under Blanket Authorization

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-30

    ... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission [Docket No. CP13-179-000] Southern Star..., 2013, Southern Star Central Gas Pipeline, Inc. (Southern Star), 4700 Highway 56, Owensboro, Kentucky... free). For TTY, call (202) 502-8659. Specifically, Southern Star proposes to abandon in place four...

  19. Beyond School-to-Prison Pipeline and toward an Educational and Penal Realism

    ERIC Educational Resources Information Center

    Fasching-Varner, Kenneth J.; Mitchell, Roland W.; Martin, Lori L.; Bennett-Haron, Karen P.

    2014-01-01

    Much scholarly attention has been paid to the school-to-prison pipeline and the sanitized discourse of "death by education," called the achievement gap. Additionally, there exists a longstanding discourse surrounding the alleged crisis of educational failure. This article offers no solutions to the crisis and suggests instead that the…

  20. The structure of clinical translation: efficiency, information, and ethics.

    PubMed

    Kimmelman, Jonathan; London, Alex John

    2015-01-01

    The so-called drug pipeline is not really about drugs and not much like a pipeline. It is really about the production and dissemination of information, and it is much more like a web. The misunderstanding leads to a poor understanding of what's wrong with clinical translation and how it can be improved.

  1. The measurement of substance use among adolescents: when is the 'bogus pipeline' method needed?

    PubMed

    Murray, D M; Perry, C L

    1987-01-01

    The use of objective measures to assess cigarette smoking among adolescents has become commonplace in research studies in recent years. This trend is based on evidence that this so called pipeline methodology can increase the disclosure of socially proscribed behaviors in a setting where adolescents might otherwise feel pressure to deny that they smoke. This paper examines the effects of the pipeline methodology alone and in combination with procedures designed to ensure anonymity on the disclosure of tobacco, alcohol, and marijuana use by young adolescents. The data indicate that the pipeline procedures significantly increase disclosure of tobacco and marijuana use when students are promised confidentiality but not anonymity. However, when anonymity was assured, disclosure of cigarette use was just as high without the pipeline; for marijuana use, disclosure was higher without the pipeline. No effects were observed for alcohol disclosure. These data are interpreted for their implications for prospective and cross sectional studies.

  2. Gravitational Wave Detection of Compact Binaries Through Multivariate Analysis

    NASA Astrophysics Data System (ADS)

    Atallah, Dany Victor; Dorrington, Iain; Sutton, Patrick

    2017-01-01

    The first detection of gravitational waves (GW), GW150914, as produced by a binary black hole merger, has ushered in the era of GW astronomy. The detection technique used to find GW150914 considered only a fraction of the information available describing the candidate event: mainly the detector signal to noise ratios and chi-squared values. In hopes of greatly increasing detection rates, we want to take advantage of all the information available about candidate events. We employ a technique called Multivariate Analysis (MVA) to improve LIGO sensitivity to GW signals. MVA techniques are efficient ways to scan high dimensional data spaces for signal/noise classification. Our goal is to use MVA to classify compact-object binary coalescence (CBC) events composed of any combination of black holes and neutron stars. CBC waveforms are modeled through numerical relativity. Templates of the modeled waveforms are used to search for CBCs and quantify candidate events. Different MVA pipelines are under investigation to look for CBC signals and un-modelled signals, with promising results. One such MVA pipeline used for the un-modelled search can theoretically analyze far more data than the MVA pipelines currently explored for CBCs, potentially making a more powerful classifier. In principle, this extra information could improve the sensitivity to GW signals. We will present the results from our efforts to adapt an MVA pipeline used in the un-modelled search to classify candidate events from the CBC search.

  3. 75 FR 63452 - Lobo Pipeline Company L.P.; Notice of Baseline Filing

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-15

    ... Company L.P.; Notice of Baseline Filing October 7, 2010. Take notice that on October 1, 2010, Lobo Pipeline Company L.P. submitted a revised baseline filing of its Statement of Operating Conditions for... free). For TTY, call (202) 502-8659. Comment Date: 5 p.m. Eastern time on Wednesday, October 20, 2010...

  4. Bat detective-Deep learning tools for bat acoustic signal detection.

    PubMed

    Mac Aodha, Oisin; Gibb, Rory; Barlow, Kate E; Browning, Ella; Firman, Michael; Freeman, Robin; Harder, Briana; Kinsey, Libby; Mead, Gary R; Newson, Stuart E; Pandourski, Ivan; Parsons, Stuart; Russ, Jon; Szodoray-Paradi, Abigel; Szodoray-Paradi, Farkas; Tilova, Elena; Girolami, Mark; Brostow, Gabriel; Jones, Kate E

    2018-03-01

    Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio.

  5. Bat detective—Deep learning tools for bat acoustic signal detection

    PubMed Central

    Barlow, Kate E.; Firman, Michael; Freeman, Robin; Harder, Briana; Kinsey, Libby; Mead, Gary R.; Newson, Stuart E.; Pandourski, Ivan; Russ, Jon; Szodoray-Paradi, Abigel; Tilova, Elena; Girolami, Mark; Jones, Kate E.

    2018-01-01

    Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio. PMID:29518076

  6. 18 CFR 357.3 - FERC Form No. 73, Oil Pipeline Data for Depreciation Analysis.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... Pipeline Data for Depreciation Analysis. 357.3 Section 357.3 Conservation of Power and Water Resources... No. 73, Oil Pipeline Data for Depreciation Analysis. (a) Who must file. Any oil pipeline company.... 73, Oil Pipeline Data for Depreciation Analysis, available for review at the Commission's Public...

  7. On the construction of a new stellar classification template library for the LAMOST spectral analysis pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, Peng; Luo, Ali; Li, Yinbi

    2014-05-01

    The LAMOST spectral analysis pipeline, called the 1D pipeline, aims to classify and measure the spectra observed in the LAMOST survey. Through this pipeline, the observed stellar spectra are classified into different subclasses by matching with template spectra. Consequently, the performance of the stellar classification greatly depends on the quality of the template spectra. In this paper, we construct a new LAMOST stellar spectral classification template library, which is supposed to improve the precision and credibility of the present LAMOST stellar classification. About one million spectra are selected from LAMOST Data Release One to construct the new stellar templates, andmore » they are gathered in 233 groups by two criteria: (1) pseudo g – r colors obtained by convolving the LAMOST spectra with the Sloan Digital Sky Survey ugriz filter response curve, and (2) the stellar subclass given by the LAMOST pipeline. In each group, the template spectra are constructed using three steps. (1) Outliers are excluded using the Local Outlier Probabilities algorithm, and then the principal component analysis method is applied to the remaining spectra of each group. About 5% of the one million spectra are ruled out as outliers. (2) All remaining spectra are reconstructed using the first principal components of each group. (3) The weighted average spectrum is used as the template spectrum in each group. Using the previous 3 steps, we initially obtain 216 stellar template spectra. We visually inspect all template spectra, and 29 spectra are abandoned due to low spectral quality. Furthermore, the MK classification for the remaining 187 template spectra is manually determined by comparing with 3 template libraries. Meanwhile, 10 template spectra whose subclass is difficult to determine are abandoned. Finally, we obtain a new template library containing 183 LAMOST template spectra with 61 different MK classes by combining it with the current library.« less

  8. Oman-India pipeline sets survey challenges. Crossing involves most rugged terrain, water depths four times greater than previous attempts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Flynn, J.

    1995-02-01

    Decisions concerning the route for the world`s deepest pipeline call for some of the most challenging commercial oceanographic and engineering surveys ever undertaken. Oman Oil Co.`s 1, 170-kilometer pipeline will carry 2 billion cubic feet of gas daily across the Arabian Sea from Oman to the northern coast of India at the Gulf of Kutch. Not only will the project be in water depths four times greater than any previous pipeline, but it will cross some of the world`s most rugged seabed terrain, traversing ridges and plunging into deep canyons. Project costs are likely to approach $5 billion.

  9. Detection of leaks in buried rural water pipelines using thermal infrared images

    USGS Publications Warehouse

    Eidenshink, Jeffery C.

    1985-01-01

    Leakage is a major problem in many pipelines. Minor leaks called 'seeper leaks', which generally range from 2 to 10 m3 per day, are common and are difficult to detect using conventional ground surveys. The objective of this research was to determine whether airborne thermal-infrared remote sensing could be used in detecting leaks and monitoring rural water pipelines. This study indicates that such leaks can be detected using low-altitude 8.7- to 11.5. micrometer wavelength, thermal infrared images collected under proper conditions.

  10. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application

    PubMed Central

    2015-01-01

    Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471

  11. Comparison of Ion Personal Genome Machine Platforms for the Detection of Variants in BRCA1 and BRCA2.

    PubMed

    Hwang, Sang Mee; Lee, Ki Chan; Lee, Min Seob; Park, Kyoung Un

    2018-01-01

    Transition to next generation sequencing (NGS) for BRCA1 / BRCA2 analysis in clinical laboratories is ongoing but different platforms and/or data analysis pipelines give different results resulting in difficulties in implementation. We have evaluated the Ion Personal Genome Machine (PGM) Platforms (Ion PGM, Ion PGM Dx, Thermo Fisher Scientific) for the analysis of BRCA1 /2. The results of Ion PGM with OTG-snpcaller, a pipeline based on Torrent mapping alignment program and Genome Analysis Toolkit, from 75 clinical samples and 14 reference DNA samples were compared with Sanger sequencing for BRCA1 / BRCA2 . Ten clinical samples and 14 reference DNA samples were additionally sequenced by Ion PGM Dx with Torrent Suite. Fifty types of variants including 18 pathogenic or variants of unknown significance were identified from 75 clinical samples and known variants of the reference samples were confirmed by Sanger sequencing and/or NGS. One false-negative results were present for Ion PGM/OTG-snpcaller for an indel variant misidentified as a single nucleotide variant. However, eight discordant results were present for Ion PGM Dx/Torrent Suite with both false-positive and -negative results. A 40-bp deletion, a 4-bp deletion and a 1-bp deletion variant was not called and a false-positive deletion was identified. Four other variants were misidentified as another variant. Ion PGM/OTG-snpcaller showed acceptable performance with good concordance with Sanger sequencing. However, Ion PGM Dx/Torrent Suite showed many discrepant results not suitable for use in a clinical laboratory, requiring further optimization of the data analysis for calling variants.

  12. An improved ChIP-seq peak detection system for simultaneously identifying post-translational modified transcription factors by combinatorial fusion, using SUMOylation as an example.

    PubMed

    Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching

    2014-01-01

    Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.

  13. Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments

    PubMed Central

    Chung, Lisa M.; Colangelo, Christopher M.; Zhao, Hongyu

    2014-01-01

    Multiple Reaction Monitoring (MRM) conducted on a triple quadrupole mass spectrometer allows researchers to quantify the expression levels of a set of target proteins. Each protein is often characterized by several unique peptides that can be detected by monitoring predetermined fragment ions, called transitions, for each peptide. Concatenating large numbers of MRM transitions into a single assay enables simultaneous quantification of hundreds of peptides and proteins. In recognition of the important role that MRM can play in hypothesis-driven research and its increasing impact on clinical proteomics, targeted proteomics such as MRM was recently selected as the Nature Method of the Year. However, there are many challenges in MRM applications, especially data pre‑processing where many steps still rely on manual inspection of each observation in practice. In this paper, we discuss an analysis pipeline to automate MRM data pre‑processing. This pipeline includes data quality assessment across replicated samples, outlier detection, identification of inaccurate transitions, and data normalization. We demonstrate the utility of our pipeline through its applications to several real MRM data sets. PMID:24905083

  14. Data Pre-Processing for Label-Free Multiple Reaction Monitoring (MRM) Experiments.

    PubMed

    Chung, Lisa M; Colangelo, Christopher M; Zhao, Hongyu

    2014-06-05

    Multiple Reaction Monitoring (MRM) conducted on a triple quadrupole mass spectrometer allows researchers to quantify the expression levels of a set of target proteins. Each protein is often characterized by several unique peptides that can be detected by monitoring predetermined fragment ions, called transitions, for each peptide. Concatenating large numbers of MRM transitions into a single assay enables simultaneous quantification of hundreds of peptides and proteins. In recognition of the important role that MRM can play in hypothesis-driven research and its increasing impact on clinical proteomics, targeted proteomics such as MRM was recently selected as the Nature Method of the Year. However, there are many challenges in MRM applications, especially data pre‑processing where many steps still rely on manual inspection of each observation in practice. In this paper, we discuss an analysis pipeline to automate MRM data pre‑processing. This pipeline includes data quality assessment across replicated samples, outlier detection, identification of inaccurate transitions, and data normalization. We demonstrate the utility of our pipeline through its applications to several real MRM data sets.

  15. An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems

    PubMed Central

    Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.

    2014-01-01

    Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544

  16. MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT

    PubMed Central

    Zhang, Guo-Qiang; Zhu, Wei; Sun, Mengmeng; Tao, Shiqiang; Bodenreider, Olivier; Cui, Licong

    2015-01-01

    Non-lattice fragments are often indicative of structural anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. However, extracting the non-lattice fragments in large ontological systems is computationally expensive if not prohibitive, using a traditional sequential approach. In this paper we present a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), for extracting non-lattice fragments in large partially ordered sets and demonstrate its applicability in ontology quality assurance. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014 (each containing over 300k concepts), with an average total computing time of less than 3 hours per version. With dramatically reduced time, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). This demonstrates that fragments around non-lattice pairs exhibit significantly higher rates of change in the process of ontological evolution. PMID:25705725

  17. MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT.

    PubMed

    Zhang, Guo-Qiang; Zhu, Wei; Sun, Mengmeng; Tao, Shiqiang; Bodenreider, Olivier; Cui, Licong

    2014-10-01

    Non-lattice fragments are often indicative of structural anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. However, extracting the non-lattice fragments in large ontological systems is computationally expensive if not prohibitive, using a traditional sequential approach. In this paper we present a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), for extracting non-lattice fragments in large partially ordered sets and demonstrate its applicability in ontology quality assurance. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014 (each containing over 300k concepts), with an average total computing time of less than 3 hours per version. With dramatically reduced time, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). This demonstrates that fragments around non-lattice pairs exhibit significantly higher rates of change in the process of ontological evolution.

  18. Leaks in the Chicana and Chicano Educational Pipeline. Latino Policy & Issues Brief. Number 13

    ERIC Educational Resources Information Center

    Yosso, Tara J.; Solorzano, Daniel G.

    2006-01-01

    Academic institutions facilitate the flow of knowledge, skills, and students through the educational pipeline. Yet, no matter how one measures educational outcomes, Chicana/os suffer the lowest educational attainment of any major racial or ethnic group in the United States. This brief calls for the repair of the serious and persistent leaks in the…

  19. White Teachers' Role in Sustaining the School-to-Prison Pipeline: Recommendations for Teacher Education

    ERIC Educational Resources Information Center

    Bryan, Nathaniel

    2017-01-01

    Educational scholarship has called attention to the disproportionate ways Black males are disciplined in schools, which has become the catalyst to their entry into the school-to-prison pipeline through which they are funneled from K-12 classrooms into the criminal justice system. Since the majority of teachers are White, it may be insightful to…

  20. Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

    PubMed Central

    Torri, Federica; Dinov, Ivo D.; Zamanyan, Alen; Hobel, Sam; Genco, Alex; Petrosyan, Petros; Clark, Andrew P.; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Knowles, James A.; Ames, Joseph; Kesselman, Carl; Toga, Arthur W.; Potkin, Steven G.; Vawter, Marquis P.; Macciardi, Fabio

    2012-01-01

    Whole-genome and exome sequencing have already proven to be essential and powerful methods to identify genes responsible for simple Mendelian inherited disorders. These methods can be applied to complex disorders as well, and have been adopted as one of the current mainstream approaches in population genetics. These achievements have been made possible by next generation sequencing (NGS) technologies, which require substantial bioinformatics resources to analyze the dense and complex sequence data. The huge analytical burden of data from genome sequencing might be seen as a bottleneck slowing the publication of NGS papers at this time, especially in psychiatric genetics. We review the existing methods for processing NGS data, to place into context the rationale for the design of a computational resource. We describe our method, the Graphical Pipeline for Computational Genomics (GPCG), to perform the computational steps required to analyze NGS data. The GPCG implements flexible workflows for basic sequence alignment, sequence data quality control, single nucleotide polymorphism analysis, copy number variant identification, annotation, and visualization of results. These workflows cover all the analytical steps required for NGS data, from processing the raw reads to variant calling and annotation. The current version of the pipeline is freely available at http://pipeline.loni.ucla.edu. These applications of NGS analysis may gain clinical utility in the near future (e.g., identifying miRNA signatures in diseases) when the bioinformatics approach is made feasible. Taken together, the annotation tools and strategies that have been developed to retrieve information and test hypotheses about the functional role of variants present in the human genome will help to pinpoint the genetic risk factors for psychiatric disorders. PMID:23139896

  1. MSP-HTPrimer: a high-throughput primer design tool to improve assay design for DNA methylation analysis in epigenetics.

    PubMed

    Pandey, Ram Vinay; Pulverer, Walter; Kallmeyer, Rainer; Beikircher, Gabriel; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Bisulfite (BS) conversion-based and methylation-sensitive restriction enzyme (MSRE)-based PCR methods have been the most commonly used techniques for locus-specific DNA methylation analysis. However, both methods have advantages and limitations. Thus, an integrated approach would be extremely useful to quantify the DNA methylation status successfully with great sensitivity and specificity. Designing specific and optimized primers for target regions is the most critical and challenging step in obtaining the adequate DNA methylation results using PCR-based methods. Currently, no integrated, optimized, and high-throughput methylation-specific primer design software methods are available for both BS- and MSRE-based methods. Therefore an integrated, powerful, and easy-to-use methylation-specific primer design pipeline with great accuracy and success rate will be very useful. We have developed a new web-based pipeline, called MSP-HTPrimer, to design primers pairs for MSP, BSP, pyrosequencing, COBRA, and MSRE assays on both genomic strands. First, our pipeline converts all target sequences into bisulfite-treated templates for both forward and reverse strand and designs all possible primer pairs, followed by filtering for single nucleotide polymorphisms (SNPs) and known repeat regions. Next, each primer pairs are annotated with the upstream and downstream RefSeq genes, CpG island, and cut sites (for COBRA and MSRE). Finally, MSP-HTPrimer selects specific primers from both strands based on custom and user-defined hierarchical selection criteria. MSP-HTPrimer produces a primer pair summary output table in TXT and HTML format for display and UCSC custom tracks for resulting primer pairs in GTF format. MSP-HTPrimer is an integrated, web-based, and high-throughput pipeline and has no limitation on the number and size of target sequences and designs MSP, BSP, pyrosequencing, COBRA, and MSRE assays. It is the only pipeline, which automatically designs primers on both genomic strands to increase the success rate. It is a standalone web-based pipeline, which is fully configured within a virtual machine and thus can be readily used without any configuration. We have experimentally validated primer pairs designed by our pipeline and shown a very high success rate of primer pairs: out of 66 BSP primer pairs, 63 were successfully validated without any further optimization step and using the same qPCR conditions. The MSP-HTPrimer pipeline is freely available from http://sourceforge.net/p/msp-htprimer.

  2. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.

    PubMed

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-03-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.

  3. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

    PubMed Central

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-01-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945

  4. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. 3D Visualization for Phoenix Mars Lander Science Operations

    NASA Technical Reports Server (NTRS)

    Edwards, Laurence; Keely, Leslie; Lees, David; Stoker, Carol

    2012-01-01

    Planetary surface exploration missions present considerable operational challenges in the form of substantial communication delays, limited communication windows, and limited communication bandwidth. A 3D visualization software was developed and delivered to the 2008 Phoenix Mars Lander (PML) mission. The components of the system include an interactive 3D visualization environment called Mercator, terrain reconstruction software called the Ames Stereo Pipeline, and a server providing distributed access to terrain models. The software was successfully utilized during the mission for science analysis, site understanding, and science operations activity planning. A terrain server was implemented that provided distribution of terrain models from a central repository to clients running the Mercator software. The Ames Stereo Pipeline generates accurate, high-resolution, texture-mapped, 3D terrain models from stereo image pairs. These terrain models can then be visualized within the Mercator environment. The central cross-cutting goal for these tools is to provide an easy-to-use, high-quality, full-featured visualization environment that enhances the mission science team s ability to develop low-risk productive science activity plans. In addition, for the Mercator and Viz visualization environments, extensibility and adaptability to different missions and application areas are key design goals.

  6. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  7. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification.

    PubMed

    Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

    2015-01-01

    Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.

  8. Development of single-copy nuclear intron markers for species-level phylogenetics: Case study with Paullinieae (Sapindaceae).

    PubMed

    Chery, Joyce G; Sass, Chodon; Specht, Chelsea D

    2017-09-01

    We developed a bioinformatic pipeline that leverages a publicly available genome and published transcriptomes to design primers in conserved coding sequences flanking targeted introns of single-copy nuclear loci. Paullinieae (Sapindaceae) is used to demonstrate the pipeline. Transcriptome reads phylogenetically closer to the lineage of interest are aligned to the closest genome. Single-nucleotide polymorphisms are called, generating a "pseudoreference" closer to the lineage of interest. Several filters are applied to meet the criteria of single-copy nuclear loci with introns of a desired size. Primers are designed in conserved coding sequences flanking introns. Using this pipeline, we developed nine single-copy nuclear intron markers for Paullinieae. This pipeline is highly flexible and can be used for any group with available genomic and transcriptomic resources. This pipeline led to the development of nine variable markers for phylogenetic study without generating sequence data de novo.

  9. Pipeline to Pathways: New Directions for Improving the Status of Women on Campus

    ERIC Educational Resources Information Center

    White, Judith S.

    2005-01-01

    For the past thirty years, much of the effort to improve the status of women in higher education has focused on the so-called "pipeline" theory, which held that a large number of women undergraduates and graduate students would, over time, yield larger numbers of women at the highest academic ranks. In other words, getting more women into college,…

  10. Reliability-based management of buried pipelines considering external corrosion defects

    NASA Astrophysics Data System (ADS)

    Miran, Seyedeh Azadeh

    Corrosion is one of the main deteriorating mechanisms that degrade the energy pipeline integrity, due to transferring corrosive fluid or gas and interacting with corrosive environment. Corrosion defects are usually detected by periodical inspections using in-line inspection (ILI) methods. In order to ensure pipeline safety, this study develops a cost-effective maintenance strategy that consists of three aspects: corrosion growth model development using ILI data, time-dependent performance evaluation, and optimal inspection interval determination. In particular, the proposed study is applied to a cathodic protected buried steel pipeline located in Mexico. First, time-dependent power-law formulation is adopted to probabilistically characterize growth of the maximum depth and length of the external corrosion defects. Dependency between defect depth and length are considered in the model development and generation of the corrosion defects over time is characterized by the homogenous Poisson process. The growth models unknown parameters are evaluated based on the ILI data through the Bayesian updating method with Markov Chain Monte Carlo (MCMC) simulation technique. The proposed corrosion growth models can be used when either matched or non-matched defects are available, and have ability to consider newly generated defects since last inspection. Results of this part of study show that both depth and length growth models can predict damage quantities reasonably well and a strong correlation between defect depth and length is found. Next, time-dependent system failure probabilities are evaluated using developed corrosion growth models considering prevailing uncertainties where three failure modes, namely small leak, large leak and rupture are considered. Performance of the pipeline is evaluated through failure probability per km (or called a sub-system) where each subsystem is considered as a series system of detected and newly generated defects within that sub-system. Sensitivity analysis is also performed to determine to which incorporated parameter(s) in the growth models reliability of the studied pipeline is most sensitive. The reliability analysis results suggest that newly generated defects should be considered in calculating failure probability, especially for prediction of long-term performance of the pipeline and also, impact of the statistical uncertainty in the model parameters is significant that should be considered in the reliability analysis. Finally, with the evaluated time-dependent failure probabilities, a life cycle-cost analysis is conducted to determine optimal inspection interval of studied pipeline. The expected total life-cycle costs consists construction cost and expected costs of inspections, repair, and failure. The repair is conducted when failure probability from any described failure mode exceeds pre-defined probability threshold after each inspection. Moreover, this study also investigates impact of repair threshold values and unit costs of inspection and failure on the expected total life-cycle cost and optimal inspection interval through a parametric study. The analysis suggests that a smaller inspection interval leads to higher inspection costs, but can lower failure cost and also repair cost is less significant compared to inspection and failure costs.

  11. 30 CFR 291.102 - May I call the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 30 Mineral Resources 2 2014-07-01 2014-07-01 false May I call the BSEE Hotline to informally... the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory access was denied... open and nondiscriminatory access by calling the toll-free BSEE Pipeline Open Access Hotline at 1-888...

  12. 30 CFR 291.102 - May I call the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 30 Mineral Resources 2 2013-07-01 2013-07-01 false May I call the BSEE Hotline to informally... the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory access was denied... open and nondiscriminatory access by calling the toll-free BSEE Pipeline Open Access Hotline at 1-888...

  13. 30 CFR 291.102 - May I call the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 30 Mineral Resources 2 2012-07-01 2012-07-01 false May I call the BSEE Hotline to informally... the BSEE Hotline to informally resolve an allegation that open and nondiscriminatory access was denied... open and nondiscriminatory access by calling the toll-free BSEE Pipeline Open Access Hotline at 1-888...

  14. Phylogenetic Conflict in Bears Identified by Automated Discovery of Transposable Element Insertions in Low-Coverage Genomes

    PubMed Central

    Gallus, Susanne; Janke, Axel

    2017-01-01

    Abstract Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation. PMID:28985298

  15. Frequency Spectrum Method-Based Stress Analysis for Oil Pipelines in Earthquake Disaster Areas

    PubMed Central

    Wu, Xiaonan; Lu, Hongfang; Huang, Kun; Wu, Shijuan; Qiao, Weibiao

    2015-01-01

    When a long distance oil pipeline crosses an earthquake disaster area, inertial force and strong ground motion can cause the pipeline stress to exceed the failure limit, resulting in bending and deformation failure. To date, researchers have performed limited safety analyses of oil pipelines in earthquake disaster areas that include stress analysis. Therefore, using the spectrum method and theory of one-dimensional beam units, CAESAR II is used to perform a dynamic earthquake analysis for an oil pipeline in the XX earthquake disaster area. This software is used to determine if the displacement and stress of the pipeline meet the standards when subjected to a strong earthquake. After performing the numerical analysis, the primary seismic action axial, longitudinal and horizontal displacement directions and the critical section of the pipeline can be located. Feasible project enhancement suggestions based on the analysis results are proposed. The designer is able to utilize this stress analysis method to perform an ultimate design for an oil pipeline in earthquake disaster areas; therefore, improving the safe operation of the pipeline. PMID:25692790

  16. Frequency spectrum method-based stress analysis for oil pipelines in earthquake disaster areas.

    PubMed

    Wu, Xiaonan; Lu, Hongfang; Huang, Kun; Wu, Shijuan; Qiao, Weibiao

    2015-01-01

    When a long distance oil pipeline crosses an earthquake disaster area, inertial force and strong ground motion can cause the pipeline stress to exceed the failure limit, resulting in bending and deformation failure. To date, researchers have performed limited safety analyses of oil pipelines in earthquake disaster areas that include stress analysis. Therefore, using the spectrum method and theory of one-dimensional beam units, CAESAR II is used to perform a dynamic earthquake analysis for an oil pipeline in the XX earthquake disaster area. This software is used to determine if the displacement and stress of the pipeline meet the standards when subjected to a strong earthquake. After performing the numerical analysis, the primary seismic action axial, longitudinal and horizontal displacement directions and the critical section of the pipeline can be located. Feasible project enhancement suggestions based on the analysis results are proposed. The designer is able to utilize this stress analysis method to perform an ultimate design for an oil pipeline in earthquake disaster areas; therefore, improving the safe operation of the pipeline.

  17. DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

    PubMed

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-08-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

  18. DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

    PubMed Central

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-01-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089

  19. Anaconda: AN automated pipeline for somatic COpy Number variation Detection and Annotation from tumor exome sequencing data.

    PubMed

    Gao, Jianing; Wan, Changlin; Zhang, Huan; Li, Ao; Zang, Qiguang; Ban, Rongjun; Ali, Asim; Yu, Zhenghua; Shi, Qinghua; Jiang, Xiaohua; Zhang, Yuanwei

    2017-10-03

    Copy number variations (CNVs) are the main genetic structural variations in cancer genome. Detecting CNVs in genetic exome region is efficient and cost-effective in identifying cancer associated genes. Many tools had been developed accordingly and yet these tools lack of reliability because of high false negative rate, which is intrinsically caused by genome exonic bias. To provide an alternative option, here, we report Anaconda, a comprehensive pipeline that allows flexible integration of multiple CNV-calling methods and systematic annotation of CNVs in analyzing WES data. Just by one command, Anaconda can generate CNV detection result by up to four CNV detecting tools. Associated with comprehensive annotation analysis of genes involved in shared CNV regions, Anaconda is able to deliver a more reliable and useful report in assistance with CNV-associate cancer researches. Anaconda package and manual can be freely accessed at http://mcg.ustc.edu.cn/bsc/ANACONDA/ .

  20. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  1. EBEX: A Balloon-Borne Telescope for Measuring Cosmic Microwave Background Polarization

    NASA Astrophysics Data System (ADS)

    Chapman, Daniel

    2015-05-01

    EBEX is a long-duration balloon-borne (LDB) telescope designed to probe polarization signals in the cosmic microwave background (CMB). It is designed to measure or place an upper limit on the inflationary B-mode signal, a signal predicted by inflationary theories to be imprinted on the CMB by gravitational waves, to detect the effects of gravitational lensing on the polarization of the CMB, and to characterize polarized Galactic foreground emission. The payload consists of a pointed gondola that houses the optics, polarimetry, detectors and detector readout systems, as well as the pointing sensors, control motors, telemetry sytems, and data acquisition and flight control computers. Polarimetry is achieved with a rotating half-wave plate and wire grid polarizer. The detectors are sensitive to frequency bands centered on 150, 250, and 410 GHz. EBEX was flown in 2009 from New Mexico as a full system test, and then flown again in December 2012 / January 2013 over Antarctica in a long-duration flight to collect scientific data. In the instrumentation part of this thesis we discuss the pointing sensors and attitude determination algorithms. We also describe the real-time map making software, "QuickLook", that was custom-designed for EBEX. We devote special attention to the design and construction of the primary pointing sensors, the star cameras, and their custom-designed flight software package, "STARS" (the Star Tracking Attitude Reconstruction Software). In the analysis part of this thesis we describe the current status of the post-flight analysis procedure. We discuss the data structures used in analysis and the pipeline stages related to attitude determination and map making. We also discuss a custom-designed software framework called "LEAP" (the LDB EBEX Analysis Pipeline) that supports most of the analysis pipeline stages.

  2. Argentine gas system underway for Gas del Estado

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bosch, H.

    Gas del Estado's giant 1074-mile Centro-Oeste pipeline project - designed to ultimately transport over 350 million CF/day of natural gas from the Neuquen basin to the Campo Duran-Buenos Aires pipeline system - is now underway. The COGASCO consortium of Dutch and Argentine companies awarded the construction project will also operate and maintain the system for 15 years after its completion. In addition to the 30-in. pipelines, the agreement calls for a major compressor station at the gas field, three intermediate compressor stations, a gas-treatment plant, liquids-recovery facilities, and the metering, control, communications, and maintenance equipment for the system. Fabricated inmore » Holland, the internally and externally coated pipe will be double-jointed to 80-ft lengths after shipment to Argentina; welders will use conventional manual-arc techniques to weld the pipeline in the field.« less

  3. Pipeline enhances Norman Wells potential

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Approval of an oil pipeline from halfway down Canada's MacKenzie River Valley at Norman Wells to N. Alberta has raised the potential for development of large reserves along with controversy over native claims. The project involves 2 closely related proposals. One, by Esso Resources, the exploration and production unit of Imperial Oil, will increase oil production from the Norman Wells field from 3000 bpd currently to 25,000 bpd. The other proposal, by Interprovincial Pipeline (N.W) Ltd., calls for construction of an underground pipeline to transport the additional production from Norman Wells to Alberta. The 560-mile, 12-in. pipeline will extend frommore » Norman Wells, which is 90 miles south of the Arctic Circle on the north shore of the Mackenzie River, south to the end of an existing line at Zama in N. Alberta. There will be 3 pumping stations en route. This work also discusses recovery, potential, drilling limitations, the processing plant, positive impact, and further development of the Norman Wells project.« less

  4. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification

    PubMed Central

    Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

    2015-01-01

    Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043

  5. TIGER: A data analysis pipeline for testing the strong-field dynamics of general relativity with gravitational wave signals from coalescing compact binaries

    NASA Astrophysics Data System (ADS)

    Agathos, M.; Del Pozzo, W.; Li, T. G. F.; Van Den Broeck, C.; Veitch, J.; Vitale, S.

    2014-04-01

    The direct detection of gravitational waves with upcoming second-generation gravitational wave observatories such as Advanced LIGO and Advanced Virgo will allow us to probe the genuinely strong-field dynamics of general relativity (GR) for the first time. We have developed a data analysis pipeline called TIGER (test infrastructure for general relativity), which uses signals from compact binary coalescences to perform a model-independent test of GR. In this paper we focus on signals from coalescing binary neutron stars, for which sufficiently accurate waveform models are already available which can be generated fast enough on a computer that they can be used in Bayesian inference. By performing numerical experiments in stationary, Gaussian noise, we show that for such systems, TIGER is robust against a number of unmodeled fundamental, astrophysical, and instrumental effects, such as differences between waveform approximants, a limited number of post-Newtonian phase contributions being known, the effects of neutron star tidal deformability on the orbital motion, neutron star spins, and instrumental calibration errors.

  6. Phylogenetic Conflict in Bears Identified by Automated Discovery of Transposable Element Insertions in Low-Coverage Genomes.

    PubMed

    Lammers, Fritjof; Gallus, Susanne; Janke, Axel; Nilsson, Maria A

    2017-10-01

    Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

    PubMed

    Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

    2017-01-01

    The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.

  8. Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations.

    PubMed

    Hu, Hao; Wienker, Thomas F; Musante, Luciana; Kalscheuer, Vera M; Kahrizi, Kimia; Najmabadi, Hossein; Ropers, H Hilger

    2014-12-01

    Next-generation sequencing has greatly accelerated the search for disease-causing defects, but even for experts the data analysis can be a major challenge. To facilitate the data processing in a clinical setting, we have developed a novel medical resequencing analysis pipeline (MERAP). MERAP assesses the quality of sequencing, and has optimized capacity for calling variants, including single-nucleotide variants, insertions and deletions, copy-number variation, and other structural variants. MERAP identifies polymorphic and known causal variants by filtering against public domain databases, and flags nonsynonymous and splice-site changes. MERAP uses a logistic model to estimate the causal likelihood of a given missense variant. MERAP considers the relevant information such as phenotype and interaction with known disease-causing genes. MERAP compares favorably with GATK, one of the widely used tools, because of its higher sensitivity for detecting indels, its easy installation, and its economical use of computational resources. Upon testing more than 1,200 individuals with mutations in known and novel disease genes, MERAP proved highly reliable, as illustrated here for five families with disease-causing variants. We believe that the clinical implementation of MERAP will expedite the diagnostic process of many disease-causing defects. © 2014 WILEY PERIODICALS, INC.

  9. 75 FR 80300 - Five-Year Review of Oil Pipeline Pricing Index

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-22

    .... On September 24, 2010, the U.S. Department of Transportation, Pipeline and Hazardous Materials Safety... pipeline cost changes for the 2004-2009 period: \\12\\ AOPL states that Dr. Shehadeh began his analysis using... typical pipeline operator. Valero states that Mr. O'Loughlin's analysis applied an objective filter which...

  10. Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads.

    PubMed

    Watson, Christopher M; Camm, Nick; Crinnion, Laura A; Clokie, Samuel; Robinson, Rachel L; Adlard, Julian; Charlton, Ruth; Markham, Alexander F; Carr, Ian M; Bonthron, David T

    2017-12-01

    Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.

  11. A continuous operating protection system called COPS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chaperon, G.

    1987-01-01

    The continuous operating protection system called COPS is a diverless solution to achieve the stabilization and protection of subsea pipelines and cables: the system is based on the use of a continuous fabric form work mattress which is spread on the sea bed over the pipeline or cable to be protected by a remotely controlled underwater crawler and simultaneously filled with cement grout. The method has been successfully used in the GULLFAKS field where about 3.6 km of grout mattresses having a cross section of 2 meters by 0.2 meters have been laid. The performances of the system are presentedmore » as well as a trade off comparison with the other stabilization and protection methods currently used: burying, rock dumping or placement of covers.« less

  12. AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

    PubMed Central

    Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462

  13. A Pipelined Non-Deterministic Finite Automaton-Based String Matching Scheme Using Merged State Transitions in an FPGA

    PubMed Central

    Choi, Kang-Il

    2016-01-01

    This paper proposes a pipelined non-deterministic finite automaton (NFA)-based string matching scheme using field programmable gate array (FPGA) implementation. The characteristics of the NFA such as shared common prefixes and no failure transitions are considered in the proposed scheme. In the implementation of the automaton-based string matching using an FPGA, each state transition is implemented with a look-up table (LUT) for the combinational logic circuit between registers. In addition, multiple state transitions between stages can be performed in a pipelined fashion. In this paper, it is proposed that multiple one-to-one state transitions, called merged state transitions, can be performed with an LUT. By cutting down the number of used LUTs for implementing state transitions, the hardware overhead of combinational logic circuits is greatly reduced in the proposed pipelined NFA-based string matching scheme. PMID:27695114

  14. A Pipelined Non-Deterministic Finite Automaton-Based String Matching Scheme Using Merged State Transitions in an FPGA.

    PubMed

    Kim, HyunJin; Choi, Kang-Il

    2016-01-01

    This paper proposes a pipelined non-deterministic finite automaton (NFA)-based string matching scheme using field programmable gate array (FPGA) implementation. The characteristics of the NFA such as shared common prefixes and no failure transitions are considered in the proposed scheme. In the implementation of the automaton-based string matching using an FPGA, each state transition is implemented with a look-up table (LUT) for the combinational logic circuit between registers. In addition, multiple state transitions between stages can be performed in a pipelined fashion. In this paper, it is proposed that multiple one-to-one state transitions, called merged state transitions, can be performed with an LUT. By cutting down the number of used LUTs for implementing state transitions, the hardware overhead of combinational logic circuits is greatly reduced in the proposed pipelined NFA-based string matching scheme.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thaule, S.B.; Postvoll, W.

    Installation by den norske stats oljeselskap A.S. (Statoil) of a powerful pipeline-modeling system on Zeepipe has allowed this major North Sea gas pipeline to meet the growing demands and seasonal variations of the European gas market. The Troll gas-sales agreement (TGSA) in 1986 called for large volumes of Norwegian gas to begin arriving from the North Sea Sleipner East field in october 1993. It is important to Statoil to maintain regular gas delivers from its integrated transport network. In addition, high utilization of transport capacity maximizes profits. In advance of operations, Statoil realized that state-of-the-art supervisory control and data acquisitionmore » (scada) and pipeline-modeling systems (PMS) would be necessary to meet its goals and to remain the most efficient North Sea operator. The paper describes the linking of Troll and Zeebrugge, contractual issues, the supervisory system, the scada module, pipeline modeling, real-time model, look-ahead model, predictive model, and model performance.« less

  16. U. K. to resume natural gas imports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1992-02-17

    This paper reports that the U.K. government has opened the way for resuming gas imports into Britain by approving a contract signed by U.K. electric power utility National Power to buy gas from Norway. A new joint marketing venture of BP Exploration, Den norske stats oljeselskap AS (Statoil), and Norsk Hydro AS also will be allowed to import gas for electric power plant fuel once it has a contract. National Power and the BP/Statoil/Norsk Hydro group will use the Frigg pipeline from Norwegian waters into St. Fergus, north of Aberdeen, the only existing link between the British transmission system andmore » foreign supplies of gas. Meantime, progress is under way toward a second pipeline to link the U.K. with foreign natural gas supplies, calling for a pipeline across the English Channel joining the continental European pipeline system to the U.K. network.« less

  17. Color correction pipeline optimization for digital cameras

    NASA Astrophysics Data System (ADS)

    Bianco, Simone; Bruna, Arcangelo R.; Naccari, Filippo; Schettini, Raimondo

    2013-04-01

    The processing pipeline of a digital camera converts the RAW image acquired by the sensor to a representation of the original scene that should be as faithful as possible. There are mainly two modules responsible for the color-rendering accuracy of a digital camera: the former is the illuminant estimation and correction module, and the latter is the color matrix transformation aimed to adapt the color response of the sensor to a standard color space. These two modules together form what may be called the color correction pipeline. We design and test new color correction pipelines that exploit different illuminant estimation and correction algorithms that are tuned and automatically selected on the basis of the image content. Since the illuminant estimation is an ill-posed problem, illuminant correction is not error-free. An adaptive color matrix transformation module is optimized, taking into account the behavior of the first module in order to alleviate the amplification of color errors. The proposed pipelines are tested on a publicly available dataset of RAW images. Experimental results show that exploiting the cross-talks between the modules of the pipeline can lead to a higher color-rendition accuracy.

  18. Regulatory assessment with regulatory flexibility analysis and paperwork reduction act analysis : draft regulatory evaluation : Notice of Proposed Rulemaking -- Pipeline Safety : Polyamide-11 (PA-11) plastic pipe design pressures

    DOT National Transportation Integrated Search

    2007-06-01

    The Pipeline and Hazardous Materials Safety Administration (PHMSA) is proposing changes to the Federal pipeline safety regulations in 49 CFR Part 192, which cover the transportation of natural gas by pipeline. Specifically, PHMSA is proposing to chan...

  19. 76 FR 75894 - Information Collection Activities: Pipelines and Pipeline Rights-of-Way; Submitted for Office of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-05

    ... pipelines `` * * * for the transportation of oil, natural gas, sulphur, or other minerals, or under such...) Submit repair report 3 1008(f) Submit report of pipeline failure analysis...... 30 1008(g) Submit plan of.... BSEE-2011-0002; OMB Control Number 1010-0050] Information Collection Activities: Pipelines and Pipeline...

  20. 30 CFR 291.102 - May I call the MMS Hotline to informally resolve an allegation that open and nondiscriminatory...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... resolve an allegation that open and nondiscriminatory access was denied? 291.102 Section 291.102 Mineral... OPEN AND NONDISCRIMINATORY ACCESS TO OIL AND GAS PIPELINES UNDER THE OUTER CONTINENTAL SHELF LANDS ACT... allegation concerning open and nondiscriminatory access by calling the toll-free MMS Hotline at 1-888-232...

  1. Germline contamination and leakage in whole genome somatic single nucleotide variant detection.

    PubMed

    Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C

    2018-01-31

    The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

  2. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.

  3. Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis.

    PubMed

    Cormier, Nathan; Kolisnik, Tyler; Bieda, Mark

    2016-07-05

    There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing.

  4. The hidden genomic landscape of acute myeloid leukemia: subclonal structure revealed by undetected mutations

    PubMed Central

    Bodini, Margherita; Ronchini, Chiara; Giacò, Luciano; Russo, Anna; Melloni, Giorgio E. M.; Luzi, Lucilla; Sardella, Domenico; Volorio, Sara; Hasan, Syed K.; Ottone, Tiziana; Lavorgna, Serena; Lo-Coco, Francesco; Candoni, Anna; Fanin, Renato; Toffoletti, Eleonora; Iacobucci, Ilaria; Martinelli, Giovanni; Cignetti, Alessandro; Tarella, Corrado; Bernard, Loris; Pelicci, Pier Giuseppe

    2015-01-01

    The analyses carried out using 2 different bioinformatics pipelines (SomaticSniper and MuTect) on the same set of genomic data from 133 acute myeloid leukemia (AML) patients, sequenced inside the Cancer Genome Atlas project, gave discrepant results. We subsequently tested these 2 variant-calling pipelines on 20 leukemia samples from our series (19 primary AMLs and 1 secondary AML). By validating many of the predicted somatic variants (variant allele frequencies ranging from 100% to 5%), we observed significantly different calling efficiencies. In particular, despite relatively high specificity, sensitivity was poor in both pipelines resulting in a high rate of false negatives. Our findings raise the possibility that landscapes of AML genomes might be more complex than previously reported and characterized by the presence of hundreds of genes mutated at low variant allele frequency, suggesting that the application of genome sequencing to the clinic requires a careful and critical evaluation. We think that improvements in technology and workflow standardization, through the generation of clear experimental and bioinformatics guidelines, are fundamental to translate the use of next-generation sequencing from research to the clinic and to transform genomic information into better diagnosis and outcomes for the patient. PMID:25499761

  5. A comprehensive probabilistic analysis model of oil pipelines network based on Bayesian network

    NASA Astrophysics Data System (ADS)

    Zhang, C.; Qin, T. X.; Jiang, B.; Huang, C.

    2018-02-01

    Oil pipelines network is one of the most important facilities of energy transportation. But oil pipelines network accident may result in serious disasters. Some analysis models for these accidents have been established mainly based on three methods, including event-tree, accident simulation and Bayesian network. Among these methods, Bayesian network is suitable for probabilistic analysis. But not all the important influencing factors are considered and the deployment rule of the factors has not been established. This paper proposed a probabilistic analysis model of oil pipelines network based on Bayesian network. Most of the important influencing factors, including the key environment condition and emergency response are considered in this model. Moreover, the paper also introduces a deployment rule for these factors. The model can be used in probabilistic analysis and sensitive analysis of oil pipelines network accident.

  6. PipelineDog: a simple and flexible graphic pipeline construction and maintenance tool.

    PubMed

    Zhou, Anbo; Zhang, Yeting; Sun, Yazhou; Xing, Jinchuan

    2018-05-01

    Analysis pipelines are an essential part of bioinformatics research, and ad hoc pipelines are frequently created by researchers for prototyping and proof-of-concept purposes. However, most existing pipeline management system or workflow engines are too complex for rapid prototyping or learning the pipeline concept. A lightweight, user-friendly and flexible solution is thus desirable. In this study, we developed a new pipeline construction and maintenance tool, PipelineDog. This is a web-based integrated development environment with a modern web graphical user interface. It offers cross-platform compatibility, project management capabilities, code formatting and error checking functions and an online repository. It uses an easy-to-read/write script system that encourages code reuse. With the online repository, it also encourages sharing of pipelines, which enhances analysis reproducibility and accountability. For most users, PipelineDog requires no software installation. Overall, this web application provides a way to rapidly create and easily manage pipelines. PipelineDog web app is freely available at http://web.pipeline.dog. The command line version is available at http://www.npmjs.com/package/pipelinedog and online repository at http://repo.pipeline.dog. ysun@kean.edu or xing@biology.rutgers.edu or ysun@diagnoa.com. Supplementary data are available at Bioinformatics online.

  7. 75 FR 35366 - Pipeline Safety: Applying Safety Regulation to All Rural Onshore Hazardous Liquid Low-Stress Lines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-06-22

    ... DEPARTMENT OF TRANSPORTATION Pipeline and Hazardous Materials Safety Administration 49 CFR Part... Onshore Hazardous Liquid Low-Stress Lines AGENCY: Pipeline and Hazardous Materials Safety Administration... pipelines to perform a complete ``could affect'' analysis to determine which rural low-stress pipeline...

  8. Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology.

    PubMed

    Emadzadeh, Ehsan; Sarker, Abeed; Nikfarjam, Azadeh; Gonzalez, Graciela

    2017-01-01

    Social networks, such as Twitter, have become important sources for active monitoring of user-reported adverse drug reactions (ADRs). Automatic extraction of ADR information can be crucial for healthcare providers, drug manufacturers, and consumers. However, because of the non-standard nature of social media language, automatically extracted ADR mentions need to be mapped to standard forms before they can be used by operational pharmacovigilance systems. We propose a modular natural language processing pipeline for mapping (normalizing) colloquial mentions of ADRs to their corresponding standardized identifiers. We seek to accomplish this task and enable customization of the pipeline so that distinct unlabeled free text resources can be incorporated to use the system for other normalization tasks. Our approach, which we call Hybrid Semantic Analysis (HSA), sequentially employs rule-based and semantic matching algorithms for mapping user-generated mentions to concept IDs in the Unified Medical Language System vocabulary. The semantic matching component of HSA is adaptive in nature and uses a regression model to combine various measures of semantic relatedness and resources to optimize normalization performance on the selected data source. On a publicly available corpus, our normalization method achieves 0.502 recall and 0.823 precision (F-measure: 0.624). Our proposed method outperforms a baseline based on latent semantic analysis and another that uses MetaMap.

  9. TEcandidates: Prediction of genomic origin of expressed Transposable Elements using RNA-seq data.

    PubMed

    Valdebenito-Maturana, Braulio; Riadi, Gonzalo

    2018-06-01

    In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multimapping reads coming from their repetitive sequences. Current approaches that address multimapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell. We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file, and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed, and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis. The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates. griadi@utalca.cl. Supplementary data are available at Bioinformatics online.

  10. sTools - a data reduction pipeline for the GREGOR Fabry-Pérot Interferometer and the High-resolution Fast Imager at the GREGOR solar telescope

    NASA Astrophysics Data System (ADS)

    Kuckein, C.; Denker, C.; Verma, M.; Balthasar, H.; González Manrique, S. J.; Louis, R. E.; Diercke, A.

    2017-10-01

    A huge amount of data has been acquired with the GREGOR Fabry-Pérot Interferometer (GFPI), large-format facility cameras, and since 2016 with the High-resolution Fast Imager (HiFI). These data are processed in standardized procedures with the aim of providing science-ready data for the solar physics community. For this purpose, we have developed a user-friendly data reduction pipeline called ``sTools'' based on the Interactive Data Language (IDL) and licensed under creative commons license. The pipeline delivers reduced and image-reconstructed data with a minimum of user interaction. Furthermore, quick-look data are generated as well as a webpage with an overview of the observations and their statistics. All the processed data are stored online at the GREGOR GFPI and HiFI data archive of the Leibniz Institute for Astrophysics Potsdam (AIP). The principles of the pipeline are presented together with selected high-resolution spectral scans and images processed with sTools.

  11. Oil and gas pipeline construction cost analysis and developing regression models for cost estimation

    NASA Astrophysics Data System (ADS)

    Thaduri, Ravi Kiran

    In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.

  12. Instruction set commutivity

    NASA Technical Reports Server (NTRS)

    Windley, P.

    1992-01-01

    We present a state property called congruence and show how it can be used to demonstrate commutivity of instructions in a modern load-store architecture. Our analysis is particularly important in pipelined microprocessors where instructions are frequently reordered to avoid costly delays in execution caused by hazards. Our work has significant implications to safety and security critical applications since reordering can easily change the meaning and an instruction sequence and current techniques are largely ad hoc. Our work is done in a mechanical theorem prover and results in a set of trustworthy rules for instruction reordering. The mechanization makes it practical to analyze the entire instruction set.

  13. Human Factors Analysis of Pipeline Monitoring and Control Operations: Final Technical Report

    DOT National Transportation Integrated Search

    2008-11-26

    The purpose of the Human Factors Analysis of Pipeline Monitoring and Control Operations project was to develop procedures that could be used by liquid pipeline operators to assess and manage the human factors risks in their control rooms that may adv...

  14. FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies.

    PubMed

    Kim, Jiwoong; Kim, Min Soo; Koh, Andrew Y; Xie, Yang; Zhan, Xiaowei

    2016-10-10

    Given the lack of a complete and comprehensive library of microbial reference genomes, determining the functional profile of diverse microbial communities is challenging. The available functional analysis pipelines lack several key features: (i) an integrated alignment tool, (ii) operon-level analysis, and (iii) the ability to process large datasets. Here we introduce our open-sourced, stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data, FMAP (Functional Mapping and Analysis Pipeline). FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines. FMAP is a comprehensive tool for providing functional analysis of metagenomic/metatranscriptomic sequencing data. With the added features of integrated alignment, operon-level analysis, and the ability to process large datasets, FMAP will be a valuable addition to the currently available functional analysis toolbox. We believe that this software will be of great value to the wider biology and bioinformatics communities.

  15. Constructing Flexible, Configurable, ETL Pipelines for the Analysis of "Big Data" with Apache OODT

    NASA Astrophysics Data System (ADS)

    Hart, A. F.; Mattmann, C. A.; Ramirez, P.; Verma, R.; Zimdars, P. A.; Park, S.; Estrada, A.; Sumarlidason, A.; Gil, Y.; Ratnakar, V.; Krum, D.; Phan, T.; Meena, A.

    2013-12-01

    A plethora of open source technologies for manipulating, transforming, querying, and visualizing 'big data' have blossomed and matured in the last few years, driven in large part by recognition of the tremendous value that can be derived by leveraging data mining and visualization techniques on large data sets. One facet of many of these tools is that input data must often be prepared into a particular format (e.g.: JSON, CSV), or loaded into a particular storage technology (e.g.: HDFS) before analysis can take place. This process, commonly known as Extract-Transform-Load, or ETL, often involves multiple well-defined steps that must be executed in a particular order, and the approach taken for a particular data set is generally sensitive to the quantity and quality of the input data, as well as the structure and complexity of the desired output. When working with very large, heterogeneous, unstructured or semi-structured data sets, automating the ETL process and monitoring its progress becomes increasingly important. Apache Object Oriented Data Technology (OODT) provides a suite of complementary data management components called the Process Control System (PCS) that can be connected together to form flexible ETL pipelines as well as browser-based user interfaces for monitoring and control of ongoing operations. The lightweight, metadata driven middleware layer can be wrapped around custom ETL workflow steps, which themselves can be implemented in any language. Once configured, it facilitates communication between workflow steps and supports execution of ETL pipelines across a distributed cluster of compute resources. As participants in a DARPA-funded effort to develop open source tools for large-scale data analysis, we utilized Apache OODT to rapidly construct custom ETL pipelines for a variety of very large data sets to prepare them for analysis and visualization applications. We feel that OODT, which is free and open source software available through the Apache Software Foundation, is particularly well suited to developing and managing arbitrary large-scale ETL processes both for the simplicity and flexibility of its wrapper framework, as well as the detailed provenance information it exposes throughout the process. Our experience using OODT to manage processing of large-scale data sets in domains as diverse as radio astronomy, life sciences, and social network analysis demonstrates the flexibility of the framework, and the range of potential applications to a broad array of big data ETL challenges.

  16. Improving the result of forcasting using reservoir and surface network simulation

    NASA Astrophysics Data System (ADS)

    Hendri, R. S.; Winarta, J.

    2018-01-01

    This study was aimed to get more representative results in production forcasting using integrated simulation in pipeline gathering system of X field. There are 5 main scenarios which consist of the production forecast of the existing condition, work over, and infill drilling. Then, it’s determined the best development scenario. The methods of this study is Integrated Reservoir Simulator and Pipeline Simulator so-calle as Integrated Reservoir and Surface Network Simulation. After well data result from reservoir simulator was then integrated with pipeline networking simulator’s to construct a new schedule, which was input for all simulation procedure. The well design result was done by well modeling simulator then exported into pipeline simulator. Reservoir prediction depends on the minimum value of Tubing Head Pressure (THP) for each well, where the pressure drop on the Gathering Network is not necessary calculated. The same scenario was done also for the single-reservoir simulation. Integration Simulation produces results approaching the actual condition of the reservoir and was confirmed by the THP profile, which difference between those two methods. The difference between integrated simulation compared to single-modeling simulation is 6-9%. The aimed of solving back-pressure problem in pipeline gathering system of X field is achieved.

  17. Simulation of systems for shock wave/compression waves damping in technological plants

    NASA Astrophysics Data System (ADS)

    Sumskoi, S. I.; Sverchkov, A. M.; Lisanov, M. V.; Egorov, A. F.

    2016-09-01

    At work of pipeline systems, flow velocity decrease can take place in the pipeline as a result of the pumps stop, the valves shutdown. As a result, compression waves appear in the pipeline systems. These waves can propagate in the pipeline system, leading to its destruction. This phenomenon is called water hammer (water hammer flow). The most dangerous situations occur when the flow is stopped quickly. Such urgent flow cutoff often takes place in an emergency situation when liquid hydrocarbons are being loaded into sea tankers. To prevent environment pollution it is necessary to stop the hydrocarbon loading urgently. The flow in this case is cut off within few seconds. To prevent an increase in pressure in a pipeline system during water hammer flow, special protective systems (pressure relief systems) are installed. The approaches to systems of protection against water hammer (pressure relief systems) modeling are described in this paper. A model of certain pressure relief system is considered. It is shown that in case of an increase in the intensity of hydrocarbons loading at a sea tanker, presence of the pressure relief system allows to organize safe mode of loading.

  18. Simplified Technique for Predicting Offshore Pipeline Expansion

    NASA Astrophysics Data System (ADS)

    Seo, J. H.; Kim, D. K.; Choi, H. S.; Yu, S. Y.; Park, K. S.

    2018-06-01

    In this study, we propose a method for estimating the amount of expansion that occurs in subsea pipelines, which could be applied in the design of robust structures that transport oil and gas from offshore wells. We begin with a literature review and general discussion of existing estimation methods and terminologies with respect to subsea pipelines. Due to the effects of high pressure and high temperature, the production of fluid from offshore wells is typically caused by physical deformation of subsea structures, e.g., expansion and contraction during the transportation process. In severe cases, vertical and lateral buckling occurs, which causes a significant negative impact on structural safety, and which is related to on-bottom stability, free-span, structural collapse, and many other factors. In addition, these factors may affect the production rate with respect to flow assurance, wax, and hydration, to name a few. In this study, we developed a simple and efficient method for generating a reliable pipe expansion design in the early stage, which can lead to savings in both cost and computation time. As such, in this paper, we propose an applicable diagram, which we call the standard dimensionless ratio (SDR) versus virtual anchor length (L A ) diagram, that utilizes an efficient procedure for estimating subsea pipeline expansion based on applied reliable scenarios. With this user guideline, offshore pipeline structural designers can reliably determine the amount of subsea pipeline expansion and the obtained results will also be useful for the installation, design, and maintenance of the subsea pipeline.

  19. Regulatory assessment with regulatory flexibility analysis : draft regulatory evaluation - Notice of Proposed Rulemaking -- Pipeline Safety : safety standards for increasing the maximum allowable operating pressure for natural gas transmission pipelines.

    DOT National Transportation Integrated Search

    2008-02-01

    The Pipeline and Hazardous Materials Safety Administration (PHMSA) is proposing changes to the Federal pipeline safety regulations in 49 CFR Part 192, which cover the transportation of natural gas by pipeline. Specifically, PHMSA proposes allowing na...

  20. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

    PubMed Central

    Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

    2017-01-01

    Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115

  1. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

    PubMed

    Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

    2017-01-01

    As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.

  2. 49 CFR Appendix C to Part 195 - Guidance for Implementation of an Integrity Management Program

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... (Continued) PIPELINE AND HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED... understanding and analysis of the failure mechanisms or threats to integrity of each pipeline segment. (2) An... pipeline, information and data used for the information analysis; (13) results of the information analyses...

  3. GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline.

    PubMed

    Thanki, Anil S; Soranzo, Nicola; Haerty, Wilfried; Davey, Robert P

    2018-03-01

    Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

  4. Developing a leadership pipeline: the Cleveland Clinic experience.

    PubMed

    Hess, Caryl A; Barss, Christina; Stoller, James K

    2014-11-01

    The complexity of health care requires excellent leadership to address the challenges of access, quality, and cost of care. Because competencies to lead differ from clinical or research skills, there is a compelling need to develop leaders and create a talent pipeline, perhaps especially in physician-led organizations like Cleveland Clinic. In this context, we previously reported on a cohort-based physician leadership development course called Leading in Health Care and, in the current report, detail an expanded health care leadership development programme called the Cleveland Clinic Academy (CCA). CCA consists of a broad suite of offerings, including cohort-based learning and 'a la carte' half- or full-day courses addressing specific competencies to manage and to lead. Academy attendance is optional and is available to all physicians, nurses, and administrators with the requisite experience. Course selection is guided by competency matrices which map leadership competencies to specific courses. As of December 2012, a total of 285 course sessions have been offered to 6,050 attendees with uniformly high ratings of course quality and impact. During the past 10 years, Cleveland Clinic's leadership and management curriculum has successfully created a pipeline of health care leaders to fill executive positions, search committees, board openings, and various other organizational leadership positions. Health care leadership can be taught and learned.

  5. Construction of a combinatorial pipeline using two somatic variant  calling  methods  for whole exome sequence data of gastric cancer.

    PubMed

    Kohmoto, Tomohiro; Masuda, Kiyoshi; Naruto, Takuya; Tange, Shoichiro; Shoda, Katsutoshi; Hamada, Junichi; Saito, Masako; Ichikawa, Daisuke; Tajima, Atsushi; Otsuji, Eigo; Imoto, Issei

    2017-01-01

    High-throughput next-generation sequencing is a powerful tool to identify the genotypic landscapes of somatic variants and therapeutic targets in various cancers including gastric cancer, forming the basis for personalized medicine in the clinical setting. Although the advent of many computational algorithms leads to higher accuracy in somatic variant calling, no standard method exists due to the limitations of each method. Here, we constructed a new pipeline. We combined two different somatic variant callers with different algorithms, Strelka and VarScan 2, and evaluated performance using whole exome sequencing data obtained from 19 Japanese cases with gastric cancer (GC); then, we characterized these tumors based on identified driver molecular alterations. More single nucleotide variants (SNVs) and small insertions/deletions were detected by Strelka and VarScan 2, respectively. SNVs detected by both tools showed higher accuracy for estimating somatic variants compared with those detected by only one of the two tools and accurately showed the mutation signature and mutations of driver genes reported for GC. Our combinatorial pipeline may have an advantage in detection of somatic mutations in GC and may be useful for further genomic characterization of Japanese patients with GC to improve the efficacy of GC treatments. J. Med. Invest. 64: 233-240, August, 2017.

  6. Designing integrated computational biology pipelines visually.

    PubMed

    Jamil, Hasan M

    2013-01-01

    The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.

  7. Historical analysis of US pipeline accidents triggered by natural hazards

    NASA Astrophysics Data System (ADS)

    Girgin, Serkan; Krausmann, Elisabeth

    2015-04-01

    Natural hazards, such as earthquakes, floods, landslides, or lightning, can initiate accidents in oil and gas pipelines with potentially major consequences on the population or the environment due to toxic releases, fires and explosions. Accidents of this type are also referred to as Natech events. Many major accidents highlight the risk associated with natural-hazard impact on pipelines transporting dangerous substances. For instance, in the USA in 1994, flooding of the San Jacinto River caused the rupture of 8 and the undermining of 29 pipelines by the floodwaters. About 5.5 million litres of petroleum and related products were spilled into the river and ignited. As a results, 547 people were injured and significant environmental damage occurred. Post-incident analysis is a valuable tool for better understanding the causes, dynamics and impacts of pipeline Natech accidents in support of future accident prevention and mitigation. Therefore, data on onshore hazardous-liquid pipeline accidents collected by the US Pipeline and Hazardous Materials Safety Administration (PHMSA) was analysed. For this purpose, a database-driven incident data analysis system was developed to aid the rapid review and categorization of PHMSA incident reports. Using an automated data-mining process followed by a peer review of the incident records and supported by natural hazard databases and external information sources, the pipeline Natechs were identified. As a by-product of the data-collection process, the database now includes over 800,000 incidents from all causes in industrial and transportation activities, which are automatically classified in the same way as the PHMSA record. This presentation describes the data collection and reviewing steps conducted during the study, provides information on the developed database and data analysis tools, and reports the findings of a statistical analysis of the identified hazardous liquid pipeline incidents in terms of accident dynamics and consequences.

  8. PG-Metrics: A chemometric-based approach for classifying bacterial peptidoglycan data sets and uncovering their subjacent chemical variability

    PubMed Central

    Kumar, Keshav; Espaillat, Akbar; Cava, Felipe

    2017-01-01

    Bacteria cells are protected from osmotic and environmental stresses by an exoskeleton-like polymeric structure called peptidoglycan (PG) or murein sacculus. This structure is fundamental for bacteria’s viability and thus, the mechanisms underlying cell wall assembly and how it is modulated serve as targets for many of our most successful antibiotics. Therefore, it is now more important than ever to understand the genetics and structural chemistry of the bacterial cell walls in order to find new and effective methods of blocking it for the treatment of disease. In the last decades, liquid chromatography and mass spectrometry have been demonstrated to provide the required resolution and sensitivity to characterize the fine chemical structure of PG. However, the large volume of data sets that can be produced by these instruments today are difficult to handle without a proper data analysis workflow. Here, we present PG-metrics, a chemometric based pipeline that allows fast and easy classification of bacteria according to their muropeptide chromatographic profiles and identification of the subjacent PG chemical variability between e.g. bacterial species, growth conditions and, mutant libraries. The pipeline is successfully validated here using PG samples from different bacterial species and mutants in cell wall proteins. The obtained results clearly demonstrated that PG-metrics pipeline is a valuable bioanalytical tool that can lead us to cell wall classification and biomarker discovery. PMID:29040278

  9. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. PMID:25003610

  10. 49 CFR 192.921 - How is the baseline assessment to be conducted?

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ...) PIPELINE AND HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED) PIPELINE SAFETY TRANSPORTATION OF NATURAL AND OTHER GAS BY PIPELINE: MINIMUM FEDERAL SAFETY STANDARDS Gas... the covered pipeline segments for the baseline assessment according to a risk analysis that considers...

  11. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

    PubMed

    Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

    2014-02-26

    Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.

  12. Targeting Neuronal-like Metabolism of Metastatic Tumor Cells as a Novel Therapy for Breast Cancer Brain Metastasis

    DTIC Science & Technology

    2017-03-01

    Contribution to Project: Ian primarily focuses on developing tissue imaging pipeline and perform imaging data analysis . Funding Support: Partially...3D ReconsTruction), a multi-faceted image analysis pipeline , permitting quantitative interrogation of functional implications of heterogeneous... analysis pipeline , to observe and quantify phenotypic metastatic landscape heterogeneity in situ with spatial and molecular resolution. Our implementation

  13. State of art of seismic design and seismic hazard analysis for oil and gas pipeline system

    NASA Astrophysics Data System (ADS)

    Liu, Aiwen; Chen, Kun; Wu, Jian

    2010-06-01

    The purpose of this paper is to adopt the uniform confidence method in both water pipeline design and oil-gas pipeline design. Based on the importance of pipeline and consequence of its failure, oil and gas pipeline can be classified into three pipe classes, with exceeding probabilities over 50 years of 2%, 5% and 10%, respectively. Performance-based design requires more information about ground motion, which should be obtained by evaluating seismic safety for pipeline engineering site. Different from a city’s water pipeline network, the long-distance oil and gas pipeline system is a spatially linearly distributed system. For the uniform confidence of seismic safety, a long-distance oil and pipeline formed with pump stations and different-class pipe segments should be considered as a whole system when analyzing seismic risk. Considering the uncertainty of earthquake magnitude, the design-basis fault displacements corresponding to the different pipeline classes are proposed to improve deterministic seismic hazard analysis (DSHA). A new empirical relationship between the maximum fault displacement and the surface-wave magnitude is obtained with the supplemented earthquake data in East Asia. The estimation of fault displacement for a refined oil pipeline in Wenchuan M S8.0 earthquake is introduced as an example in this paper.

  14. Development and Applications of Pipeline Steel in Long-Distance Gas Pipeline of China

    NASA Astrophysics Data System (ADS)

    Chunyong, Huo; Yang, Li; Lingkang, Ji

    In past decades, with widely utilizing of Microalloying and Thermal Mechanical Control Processing (TMCP) technology, the good matching of strength, toughness, plasticity and weldability on pipeline steel has been reached so that oil and gas pipeline has been greatly developed in China to meet the demand of strong domestic consumption of energy. In this paper, development history of pipeline steel and gas pipeline in china is briefly reviewed. The microstructure characteristic and mechanical performance of pipeline steel used in some representative gas pipelines of china built in different stage are summarized. Through the analysis on the evolution of pipeline service environment, some prospective development trend of application of pipeline steel in China is also presented.

  15. The initial data products from the EUVE software - A photon's journey through the End-to-End System

    NASA Technical Reports Server (NTRS)

    Antia, Behram

    1993-01-01

    The End-to-End System (EES) is a unique collection of software modules created for use at the Center for EUV Astrophysics. The 'pipeline' is a shell script which executes selected EES modules and creates initial data products: skymaps, data sets for individual sources (called 'pigeonholes') and catalogs of sources. This article emphasizes the data from the all-sky survey, conducted between July 22, 1992 and January 21, 1993. A description of each of the major data products will be given and, as an example of how the pipeline works, the reader will follow a photon's path through the software pipeline into a pigeonhole. These data products are the primary goal of the EUVE all-sky survey mission, and so their relative importance for the follow-up science will also be discussed.

  16. Comprehensive investigation into historical pipeline construction costs and engineering economic analysis of Alaska in-state gas pipeline

    NASA Astrophysics Data System (ADS)

    Rui, Zhenhua

    This study analyzes historical cost data of 412 pipelines and 220 compressor stations. On the basis of this analysis, the study also evaluates the feasibility of an Alaska in-state gas pipeline using Monte Carlo simulation techniques. Analysis of pipeline construction costs shows that component costs, shares of cost components, and learning rates for material and labor costs vary by diameter, length, volume, year, and location. Overall average learning rates for pipeline material and labor costs are 6.1% and 12.4%, respectively. Overall average cost shares for pipeline material, labor, miscellaneous, and right of way (ROW) are 31%, 40%, 23%, and 7%, respectively. Regression models are developed to estimate pipeline component costs for different lengths, cross-sectional areas, and locations. An analysis of inaccuracy in pipeline cost estimation demonstrates that the cost estimation of pipeline cost components is biased except for in the case of total costs. Overall overrun rates for pipeline material, labor, miscellaneous, ROW, and total costs are 4.9%, 22.4%, -0.9%, 9.1%, and 6.5%, respectively, and project size, capacity, diameter, location, and year of completion have different degrees of impacts on cost overruns of pipeline cost components. Analysis of compressor station costs shows that component costs, shares of cost components, and learning rates for material and labor costs vary in terms of capacity, year, and location. Average learning rates for compressor station material and labor costs are 12.1% and 7.48%, respectively. Overall average cost shares of material, labor, miscellaneous, and ROW are 50.6%, 27.2%, 21.5%, and 0.8%, respectively. Regression models are developed to estimate compressor station component costs in different capacities and locations. An investigation into inaccuracies in compressor station cost estimation demonstrates that the cost estimation for compressor stations is biased except for in the case of material costs. Overall average overrun rates for compressor station material, labor, miscellaneous, land, and total costs are 3%, 60%, 2%, -14%, and 11%, respectively, and cost overruns for cost components are influenced by location and year of completion to different degrees. Monte Carlo models are developed and simulated to evaluate the feasibility of an Alaska in-state gas pipeline by assigning triangular distribution of the values of economic parameters. Simulated results show that the construction of an Alaska in-state natural gas pipeline is feasible at three scenarios: 500 million cubic feet per day (mmcfd), 750 mmcfd, and 1000 mmcfd.

  17. Applications of the pipeline environment for visual informatics and genomics computations

    PubMed Central

    2011-01-01

    Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102

  18. TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

    PubMed

    Clark, Lindsay V; Sacks, Erik J

    2016-01-01

    In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult. We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files. TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.

  19. Diagnostic Inspection of Pipelines for Estimating the State of Stress in Them

    NASA Astrophysics Data System (ADS)

    Subbotin, V. A.; Kolotilov, Yu. V.; Smirnova, V. Yu.; Ivashko, S. K.

    2017-12-01

    The diagnostic inspection used to estimate the technical state of a pipeline is described. The problems of inspection works are listed, and a functional-structural scheme is developed to estimate the state of stress in a pipeline. Final conclusions regarding the actual loading of a pipeline section are drawn upon a cross analysis of the entire information obtained during pipeline inspection.

  20. 77 FR 43586 - Southern Star Central Gas Pipeline, Inc.; Notice of Intent To Prepare an Environmental Assessment...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-25

    ..., 888 First Street NE., Washington, DC 20426, or call (202) 502-8371. For instructions on connecting to... historic properties as any prehistoric or historic district, site, building, structure, or object included...

  1. Pipeline monitoring with unmanned aerial vehicles

    NASA Astrophysics Data System (ADS)

    Kochetkova, L. I.

    2018-05-01

    Pipeline leakage during transportation of combustible substances leads to explosion and fire thus causing death of people and destruction of production and accommodation facilities. Continuous pipeline monitoring allows identifying leaks in due time and quickly taking measures for their elimination. The paper describes the solution of identification of pipeline leakage using unmanned aerial vehicles. It is recommended to apply the spectral analysis with input RGB signal to identify pipeline damages. The application of multi-zone digital images allows defining potential spill of oil hydrocarbons as well as possible soil pollution. The method of multi-temporal digital images within the visible region makes it possible to define changes in soil morphology for its subsequent analysis. The given solution is cost efficient and reliable thus allowing reducing timing and labor resources in comparison with other methods of pipeline monitoring.

  2. Stress and Strain State Analysis of Defective Pipeline Portion

    NASA Astrophysics Data System (ADS)

    Burkov, P. V.; Burkova, S. P.; Knaub, S. A.

    2015-09-01

    The paper presents computer simulation results of the pipeline having defects in a welded joint. Autodesk Inventor software is used for simulation of the stress and strain state of the pipeline. Places of the possible failure and stress concentrators are predicted on the defective portion of the pipeline.

  3. 76 FR 25576 - Pipeline Safety: Applying Safety Regulations to All Rural Onshore Hazardous Liquid Low-Stress Lines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-05

    ... DEPARTMENT OF TRANSPORTATION Pipeline and Hazardous Materials Safety Administration 49 CFR Part... to All Rural Onshore Hazardous Liquid Low-Stress Lines AGENCY: Pipeline and Hazardous Materials... burdensome to require operators of these pipelines to perform a complete ``could affect'' analysis to...

  4. A Java-based fMRI processing pipeline evaluation system for assessment of univariate general linear model and multivariate canonical variate analysis-based pipelines.

    PubMed

    Zhang, Jing; Liang, Lichen; Anderson, Jon R; Gatewood, Lael; Rottenberg, David A; Strother, Stephen C

    2008-01-01

    As functional magnetic resonance imaging (fMRI) becomes widely used, the demands for evaluation of fMRI processing pipelines and validation of fMRI analysis results is increasing rapidly. The current NPAIRS package, an IDL-based fMRI processing pipeline evaluation framework, lacks system interoperability and the ability to evaluate general linear model (GLM)-based pipelines using prediction metrics. Thus, it can not fully evaluate fMRI analytical software modules such as FSL.FEAT and NPAIRS.GLM. In order to overcome these limitations, a Java-based fMRI processing pipeline evaluation system was developed. It integrated YALE (a machine learning environment) into Fiswidgets (a fMRI software environment) to obtain system interoperability and applied an algorithm to measure GLM prediction accuracy. The results demonstrated that the system can evaluate fMRI processing pipelines with univariate GLM and multivariate canonical variates analysis (CVA)-based models on real fMRI data based on prediction accuracy (classification accuracy) and statistical parametric image (SPI) reproducibility. In addition, a preliminary study was performed where four fMRI processing pipelines with GLM and CVA modules such as FSL.FEAT and NPAIRS.CVA were evaluated with the system. The results indicated that (1) the system can compare different fMRI processing pipelines with heterogeneous models (NPAIRS.GLM, NPAIRS.CVA and FSL.FEAT) and rank their performance by automatic performance scoring, and (2) the rank of pipeline performance is highly dependent on the preprocessing operations. These results suggest that the system will be of value for the comparison, validation, standardization and optimization of functional neuroimaging software packages and fMRI processing pipelines.

  5. Computational analysis of PET by AIBL (CapAIBL): a cloud-based processing pipeline for the quantification of PET images

    NASA Astrophysics Data System (ADS)

    Bourgeat, Pierrick; Dore, Vincent; Fripp, Jurgen; Villemagne, Victor L.; Rowe, Chris C.; Salvado, Olivier

    2015-03-01

    With the advances of PET tracers for β-Amyloid (Aβ) detection in neurodegenerative diseases, automated quantification methods are desirable. For clinical use, there is a great need for PET-only quantification method, as MR images are not always available. In this paper, we validate a previously developed PET-only quantification method against MR-based quantification using 6 tracers: 18F-Florbetaben (N=148), 18F-Florbetapir (N=171), 18F-NAV4694 (N=47), 18F-Flutemetamol (N=180), 11C-PiB (N=381) and 18F-FDG (N=34). The results show an overall mean absolute percentage error of less than 5% for each tracer. The method has been implemented as a remote service called CapAIBL (http://milxcloud.csiro.au/capaibl). PET images are uploaded to a cloud platform where they are spatially normalised to a standard template and quantified. A report containing global as well as local quantification, along with surface projection of the β-Amyloid deposition is automatically generated at the end of the pipeline and emailed to the user.

  6. Conversion events in gene clusters

    PubMed Central

    2011-01-01

    Background Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. Results To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. Conclusions These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes. PMID:21798034

  7. Common Data Analysis Pipeline | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    CPTAC supports analyses of the mass spectrometry raw data (mapping of spectra to peptide sequences and protein identification) for the public using a Common Data Analysis Pipeline (CDAP). The data types available on the public portal are described below. A general overview of this pipeline can be downloaded here. Mass Spectrometry Data Formats RAW (Vendor) Format

  8. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  9. Compact Graphical Representation of Phylogenetic Data and Metadata with GraPh1An

    DTIC Science & Technology

    2016-09-12

    pipelines . This allows for a higher degree of analysis reproducibility, but the software must correspondingly be available for local installation and callable...these operations are available in the GraPhlAn software repository). Reproducible integration with existing analysis tools and pipelines Graphical...from different analysis pipelines , generating the necessary input files for GraPhlAn. Export2graphlan directly supports MetaPhlAn2, LEfSe, and HUMAnN

  10. MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding.

    PubMed

    Sato, Yukuto; Miya, Masaki; Fukunaga, Tsukasa; Sado, Tetsuya; Iwasaki, Wataru

    2018-06-01

    Fish mitochondrial genome (mitogenome) data form a fundamental basis for revealing vertebrate evolution and hydrosphere ecology. Here, we report recent functional updates of MitoFish, which is a database of fish mitogenomes with a precise annotation pipeline MitoAnnotator. Most importantly, we describe implementation of MiFish pipeline for metabarcoding analysis of fish mitochondrial environmental DNA, which is a fast-emerging and powerful technology in fish studies. MitoFish, MitoAnnotator, and MiFish pipeline constitute a key platform for studies of fish evolution, ecology, and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed April 7th, 2018).

  11. 77 FR 66568 - Revisions to Procedural Regulations Governing Transportation by Intrastate Pipelines

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-11-06

    ... filings by those natural gas pipelines that fall under the Commission's jurisdiction pursuant to the Natural Gas Policy Act of 1978 or the Natural Gas Act. An intrastate pipeline may elect to use these... Pipelines C. Withdrawal Procedures 20 III. Information Collection Statement 21 IV. Environmental Analysis 28...

  12. A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies.

    PubMed

    Guo, Li; Allen, Kelly S; Deiulio, Greg; Zhang, Yong; Madeiras, Angela M; Wick, Robert L; Ma, Li-Jun

    2016-01-01

    Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level. Here we have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). We applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, we were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host-pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems.

  13. Inverse Transient Analysis for Classification of Wall Thickness Variations in Pipelines

    PubMed Central

    Tuck, Jeffrey; Lee, Pedro

    2013-01-01

    Analysis of transient fluid pressure signals has been investigated as an alternative method of fault detection in pipeline systems and has shown promise in both laboratory and field trials. The advantage of the method is that it can potentially provide a fast and cost effective means of locating faults such as leaks, blockages and pipeline wall degradation within a pipeline while the system remains fully operational. The only requirement is that high speed pressure sensors are placed in contact with the fluid. Further development of the method requires detailed numerical models and enhanced understanding of transient flow within a pipeline where variations in pipeline condition and geometry occur. One such variation commonly encountered is the degradation or thinning of pipe walls, which can increase the susceptible of a pipeline to leak development. This paper aims to improve transient-based fault detection methods by investigating how changes in pipe wall thickness will affect the transient behaviour of a system; this is done through the analysis of laboratory experiments. The laboratory experiments are carried out on a stainless steel pipeline of constant outside diameter, into which a pipe section of variable wall thickness is inserted. In order to detect the location and severity of these changes in wall conditions within the laboratory system an inverse transient analysis procedure is employed which considers independent variations in wavespeed and diameter. Inverse transient analyses are carried out using a genetic algorithm optimisation routine to match the response from a one-dimensional method of characteristics transient model to the experimental time domain pressure responses. The accuracy of the detection technique is evaluated and benefits associated with various simplifying assumptions and simulation run times are investigated. It is found that for the case investigated, changes in the wavespeed and nominal diameter of the pipeline are both important to the accuracy of the inverse analysis procedure and can be used to differentiate the observed transient behaviour caused by changes in wall thickness from that caused by other known faults such as leaks. Further application of the method to real pipelines is discussed.

  14. Mining sequence variations in representative polyploid sugarcane germplasm accessions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiping; Song, Jian; You, Qian

    Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less

  15. Mining sequence variations in representative polyploid sugarcane germplasm accessions

    DOE PAGES

    Yang, Xiping; Song, Jian; You, Qian; ...

    2017-08-09

    Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less

  16. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering.

    PubMed

    Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier

    2015-01-01

    In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

    PubMed

    Wang, Zichen; Ma'ayan, Avi

    2016-01-01

    RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at:  http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and  https://hub.docker.com/r/maayanlab/zika/.

  18. Transforming Education with Talent Management

    ERIC Educational Resources Information Center

    Brandt, Julie

    2011-01-01

    Attracting, developing, and retaining employees, ensuring a pipeline of qualified people, and building a culture of engagement and productivity are important to the success of any organization. It is called "talent management." With the right technology support, talent management's real value is that it allows organizations to identify high…

  19. Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children

    USDA-ARS?s Scientific Manuscript database

    To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...

  20. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Sexton, David

    2018-01-22

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  1. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sexton, David

    2012-06-01

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  2. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    PubMed

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. United States petroleum pipelines: An empirical analysis of pipeline sizing

    NASA Astrophysics Data System (ADS)

    Coburn, L. L.

    1980-12-01

    The undersizing theory hypothesizes that integrated oil companies have a strong economic incentive to size the petroleum pipelines they own and ship over in a way that means that some of the demand must utilize higher cost alternatives. The DOJ theory posits that excess or monopoly profits are earned due to the natural monopoly characteristics of petroleum pipelines and the existence of market power in some pipelines at either the upstream or downstream market. The theory holds that independent petroleum pipelines owned by companies not otherwise affiliated with the petroleum industry (independent pipelines) do not have these incentives and all the efficiencies of pipeline transportation are passed to the ultimate consumer. Integrated oil companies on the other hand, keep these cost efficiencies for themselves in the form of excess profits.

  4. A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies

    PubMed Central

    Guo, Li; Allen, Kelly S.; Deiulio, Greg; Zhang, Yong; Madeiras, Angela M.; Wick, Robert L.; Ma, Li-Jun

    2016-01-01

    Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level. Here we have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). We applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, we were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host–pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems. PMID:27462318

  5. A novel pipeline based FPGA implementation of a genetic algorithm

    NASA Astrophysics Data System (ADS)

    Thirer, Nonel

    2014-05-01

    To solve problems when an analytical solution is not available, more and more bio-inspired computation techniques have been applied in the last years. Thus, an efficient algorithm is the Genetic Algorithm (GA), which imitates the biological evolution process, finding the solution by the mechanism of "natural selection", where the strong has higher chances to survive. A genetic algorithm is an iterative procedure which operates on a population of individuals called "chromosomes" or "possible solutions" (usually represented by a binary code). GA performs several processes with the population individuals to produce a new population, like in the biological evolution. To provide a high speed solution, pipelined based FPGA hardware implementations are used, with a nstages pipeline for a n-phases genetic algorithm. The FPGA pipeline implementations are constraints by the different execution time of each stage and by the FPGA chip resources. To minimize these difficulties, we propose a bio-inspired technique to modify the crossover step by using non identical twins. Thus two of the chosen chromosomes (parents) will build up two new chromosomes (children) not only one as in classical GA. We analyze the contribution of this method to reduce the execution time in the asynchronous and synchronous pipelines and also the possibility to a cheaper FPGA implementation, by using smaller populations. The full hardware architecture for a FPGA implementation to our target ALTERA development card is presented and analyzed.

  6. 49 CFR 192.941 - What is a low stress reassessment?

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED) PIPELINE SAFETY TRANSPORTATION OF NATURAL AND OTHER GAS BY PIPELINE: MINIMUM FEDERAL SAFETY STANDARDS Gas Transmission Pipeline Integrity... gas analysis for corrosive agents at least once each calendar year; (2) Conduct periodic testing of...

  7. 76 FR 24473 - Transwestern Pipeline Company, LLC; Notice of Request Under Blanket Authorization

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-02

    ... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission [Docket No. CP11-191-000] Transwestern...,000 HP reciprocating gas engines, compressors, and ancillary facilities (Project Facilities) at its... to access the document. For assistance, contact FERC at [email protected] or call toll-free...

  8. Rows=Wildlife Corridors: An Urban Resource.

    ERIC Educational Resources Information Center

    Young, Darrell D.

    1983-01-01

    Linear strips of land associated with highways, electrical transmission lines, gas/oil pipelines (called right-of-way or ROWs) are inhibited by a variety of wildlife and offer a unique opportunity to study the wildlife in the urban setting. Types of wildlife found in and importance of ROWs are discussed. (JN)

  9. Onsite Systems - Wastewater

    Science.gov Websites

    can help. Call NESC toll free at (304) 293-4191. or e-mail info@mail.nesc.wvu.edu and ask for different situations. Onsite Technologies for Small Communities Poster - NESC provides free downloads to . Downloads are free. This poster is available to order, charges apply to hard copies only. Pipeline

  10. 33 CFR 161.18 - Reporting requirements.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... call. H HOTEL Date, time and point of entry system Entry time expressed as in (B) and into the entry... KILO Date, time and point of exit from system Exit time expressed as in (B) and exit position expressed....; for a dredge or floating plant: configuration of pipeline, mooring configuration, number of assist...

  11. The HEASARC Swift Gamma-Ray Burst Archive: The Pipeline and the Catalog

    NASA Technical Reports Server (NTRS)

    Donato, Davide; Angelini, Lorella; Padgett, C.A.; Reichard, T.; Gehrels, Neil; Marshall, Francis E.; Sakamoto, Takanori

    2012-01-01

    Since its launch in late 2004, the Swift satellite triggered or observed an average of one gamma-ray burst (GRB) every 3 days, for a total of 771 GRBs by 2012 January. Here, we report the development of a pipeline that semi automatically performs the data-reduction and data-analysis processes for the three instruments on board Swift (BAT, XRT, UVOT). The pipeline is written in Perl, and it uses only HEAsoft tools and can be used to perform the analysis of a majority of the point-like objects (e.g., GRBs, active galactic nuclei, pulsars) observed by Swift. We run the pipeline on the GRBs, and we present a database containing the screened data, the output products, and the results of our ongoing analysis. Furthermore, we created a catalog summarizing some GRB information, collected either by running the pipeline or from the literature. The Perl script, the database, and the catalog are available for downloading and querying at the HEASARC Web site.

  12. The HEASARC Swift Gamma-Ray Burst Archive: The Pipeline and the Catalog

    NASA Astrophysics Data System (ADS)

    Donato, D.; Angelini, L.; Padgett, C. A.; Reichard, T.; Gehrels, N.; Marshall, F. E.; Sakamoto, T.

    2012-11-01

    Since its launch in late 2004, the Swift satellite triggered or observed an average of one gamma-ray burst (GRB) every 3 days, for a total of 771 GRBs by 2012 January. Here, we report the development of a pipeline that semi-automatically performs the data-reduction and data-analysis processes for the three instruments on board Swift (BAT, XRT, UVOT). The pipeline is written in Perl, and it uses only HEAsoft tools and can be used to perform the analysis of a majority of the point-like objects (e.g., GRBs, active galactic nuclei, pulsars) observed by Swift. We run the pipeline on the GRBs, and we present a database containing the screened data, the output products, and the results of our ongoing analysis. Furthermore, we created a catalog summarizing some GRB information, collected either by running the pipeline or from the literature. The Perl script, the database, and the catalog are available for downloading and querying at the HEASARC Web site.

  13. The Brackets Design and Stress Analysis of a Refinery's Hot Water Pipeline

    NASA Astrophysics Data System (ADS)

    Zhou, San-Ping; He, Yan-Lin

    2016-05-01

    The reconstruction engineering which reconstructs the hot water pipeline from a power station to a heat exchange station requires the new hot water pipeline combine with old pipe racks. Taking the allowable span calculated based on GB50316 and the design philosophy of the pipeline supports into account, determine the types and locations of brackets. By analyzing the stresses of the pipeline in AutoPIPE, adjusting the supports at dangerous segments, recalculating in AutoPIPE, at last determine the types, locations and numbers of supports reasonably. Then the overall pipeline system will satisfy the requirement of the ASME B31.3.

  14. SG-ADVISER mtDNA: a web server for mitochondrial DNA annotation with data from 200 samples of a healthy aging cohort.

    PubMed

    Rueda, Manuel; Torkamani, Ali

    2017-08-18

    Whole genome and exome sequencing usually include reads containing mitochondrial DNA (mtDNA). Yet, state-of-the-art pipelines and services for human nuclear genome variant calling and annotation do not handle mitochondrial genome data appropriately. As a consequence, any researcher desiring to add mtDNA variant analysis to their investigations is forced to explore the literature for mtDNA pipelines, evaluate them, and implement their own instance of the desired tool. This task is far from trivial, and can be prohibitive for non-bioinformaticians. We have developed SG-ADVISER mtDNA, a web server to facilitate the analysis and interpretation of mtDNA genomic data coming from next generation sequencing (NGS) experiments. The server was built in the context of our SG-ADVISER framework and on top of the MtoolBox platform (Calabrese et al., Bioinformatics 30(21):3115-3117, 2014), and includes most of its functionalities (i.e., assembly of mitochondrial genomes, heteroplasmic fractions, haplogroup assignment, functional and prioritization analysis of mitochondrial variants) as well as a back-end and a front-end interface. The server has been tested with unpublished data from 200 individuals of a healthy aging cohort (Erikson et al., Cell 165(4):1002-1011, 2016) and their data is made publicly available here along with a preliminary analysis of the variants. We observed that individuals over ~90 years old carried low levels of heteroplasmic variants in their genomes. SG-ADVISER mtDNA is a fast and functional tool that allows for variant calling and annotation of human mtDNA data coming from NGS experiments. The server was built with simplicity in mind, and builds on our own experience in interpreting mtDNA variants in the context of sudden death and rare diseases. Our objective is to provide an interface for non-bioinformaticians aiming to acquire (or contrast) mtDNA annotations via MToolBox. SG-ADVISER web server is freely available to all users at https://genomics.scripps.edu/mtdna .

  15. TESS Data Processing and Quick-look Pipeline

    NASA Astrophysics Data System (ADS)

    Fausnaugh, Michael; Huang, Xu; Glidden, Ana; Guerrero, Natalia; TESS Science Office

    2018-01-01

    We describe the data analysis procedures and pipelines for the Transiting Exoplanet Survey Satellite (TESS). We briefly review the processing pipeline developed and implemented by the Science Processing Operations Center (SPOC) at NASA Ames, including pixel/full-frame image calibration, photometric analysis, pre-search data conditioning, transiting planet search, and data validation. We also describe data-quality diagnostic analyses and photometric performance assessment tests. Finally, we detail a "quick-look pipeline" (QLP) that has been developed by the MIT branch of the TESS Science Office (TSO) to provide a fast and adaptable routine to search for planet candidates in the 30 minute full-frame images.

  16. Structural reliability assessment of the Oman India Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Al-Sharif, A.M.; Preston, R.

    1996-12-31

    Reliability techniques are increasingly finding application in design. The special design conditions for the deep water sections of the Oman India Pipeline dictate their use since the experience basis for application of standard deterministic techniques is inadequate. The paper discusses the reliability analysis as applied to the Oman India Pipeline, including selection of a collapse model, characterization of the variability in the parameters that affect pipe resistance to collapse, and implementation of first and second order reliability analyses to assess the probability of pipe failure. The reliability analysis results are used as the basis for establishing the pipe wall thicknessmore » requirements for the pipeline.« less

  17. A Study on Optimal Sizing of Pipeline Transporting Equi-sized Particulate Solid-Liquid Mixture

    NASA Astrophysics Data System (ADS)

    Asim, Taimoor; Mishra, Rakesh; Pradhan, Suman; Ubbi, Kuldip

    2012-05-01

    Pipelines transporting solid-liquid mixtures are of practical interest to the oil and pipe industry throughout the world. Such pipelines are known as slurry pipelines where the solid medium of the flow is commonly known as slurry. The optimal designing of such pipelines is of commercial interests for their widespread acceptance. A methodology has been evolved for the optimal sizing of a pipeline transporting solid-liquid mixture. Least cost principle has been used in sizing such pipelines, which involves the determination of pipe diameter corresponding to the minimum cost for given solid throughput. The detailed analysis with regard to transportation of slurry having solids of uniformly graded particles size has been included. The proposed methodology can be used for designing a pipeline for transporting any solid material for different solid throughput.

  18. COINSTAC: Decentralizing the future of brain imaging analysis

    PubMed Central

    Ming, Jing; Verner, Eric; Sarwate, Anand; Kelly, Ross; Reed, Cory; Kahleck, Torran; Silva, Rogers; Panta, Sandeep; Turner, Jessica; Plis, Sergey; Calhoun, Vince

    2017-01-01

    In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications. PMID:29123643

  19. Failure Analysis of PRDS Pipe in a Thermal Power Plant Boiler

    NASA Astrophysics Data System (ADS)

    Ghosh, Debashis; Ray, Subrata; Mandal, Jiten; Mandal, Nilrudra; Shukla, Awdhesh Kumar

    2018-04-01

    The pressure reducer desuperheater (PRDS) pipeline is used for reducing the pressure and desuperheating of the steam in different auxiliary pipeline. When the PRDS pipeline is failed, the reliability of the boiler is affected. This paper investigates the probable cause/causes of failure of the PRDS tapping line. In that context, visual inspection, outside diameter and wall thickness measurement, chemical analysis, metallographic examination and hardness measurement are conducted as part of the investigative studies. Apart from these tests, mechanical testing and fractographic analysis are also conducted as supplements. Finally, it has been concluded that the PRDS pipeline has mainly failed due to graphitization due to prolonged exposure of the pipe at higher temperature. The improper material used is mainly responsible for premature failure of the pipe.

  20. Power law of distribution of emergency situations on main gas pipeline

    NASA Astrophysics Data System (ADS)

    Voronin, K. S.; Akulov, K. A.

    2018-05-01

    The article presents the results of the analysis of emergency situations on a main gas pipeline. A power law of distribution of emergency situations is revealed. The possibility of conducting further scientific research to ensure the predictability of emergency situations on pipelines is justified.

  1. MeRIP-PF: An Easy-to-use Pipeline for High-resolution Peak-finding in MeRIP-Seq Data

    PubMed Central

    Li, Yuli; Song, Shuhui; Li, Cuiping; Yu, Jun

    2013-01-01

    RNA modifications, especially methylation of the N6 position of adenosine (A)—m6A, represent an emerging research frontier in RNA biology. With the rapid development of high-throughput sequencing technology, in-depth study of m6A distribution and function relevance becomes feasible. However, a robust method to effectively identify m6A-modified regions has not been available yet. Here, we present a novel high-efficiency and user-friendly analysis pipeline called MeRIP-PF for the signal identification of MeRIP-Seq data in reference to controls. MeRIP-PF provides a statistical P-value for each identified m6A region based on the difference of read distribution when compared to the controls and also calculates false discovery rate (FDR) as a cut off to differentiate reliable m6A regions from the background. Furthermore, MeRIP-PF also achieves gene annotation of m6A signals or peaks and produce outputs in both XLS and graphical format, which are useful for further study. MeRIP-PF is implemented in Perl and is freely available at http://software.big.ac.cn/MeRIP-PF.html. PMID:23434047

  2. A Design Verification of the Parallel Pipelined Image Processings

    NASA Astrophysics Data System (ADS)

    Wasaki, Katsumi; Harai, Toshiaki

    2008-11-01

    This paper presents a case study of the design and verification of a parallel and pipe-lined image processing unit based on an extended Petri net, which is called a Logical Colored Petri net (LCPN). This is suitable for Flexible-Manufacturing System (FMS) modeling and discussion of structural properties. LCPN is another family of colored place/transition-net(CPN) with the addition of the following features: integer value assignment of marks, representation of firing conditions as marks' value based formulae, and coupling of output procedures with transition firing. Therefore, to study the behavior of a system modeled with this net, we provide a means of searching the reachability tree for markings.

  3. Proteomics Quality Control: Quality Control Software for MaxQuant Results.

    PubMed

    Bielow, Chris; Mastrobuoni, Guido; Kempa, Stefan

    2016-03-04

    Mass spectrometry-based proteomics coupled to liquid chromatography has matured into an automatized, high-throughput technology, producing data on the scale of multiple gigabytes per instrument per day. Consequently, an automated quality control (QC) and quality analysis (QA) capable of detecting measurement bias, verifying consistency, and avoiding propagation of error is paramount for instrument operators and scientists in charge of downstream analysis. We have developed an R-based QC pipeline called Proteomics Quality Control (PTXQC) for bottom-up LC-MS data generated by the MaxQuant software pipeline. PTXQC creates a QC report containing a comprehensive and powerful set of QC metrics, augmented with automated scoring functions. The automated scores are collated to create an overview heatmap at the beginning of the report, giving valuable guidance also to nonspecialists. Our software supports a wide range of experimental designs, including stable isotope labeling by amino acids in cell culture (SILAC), tandem mass tags (TMT), and label-free data. Furthermore, we introduce new metrics to score MaxQuant's Match-between-runs (MBR) functionality by which peptide identifications can be transferred across Raw files based on accurate retention time and m/z. Last but not least, PTXQC is easy to install and use and represents the first QC software capable of processing MaxQuant result tables. PTXQC is freely available at https://github.com/cbielow/PTXQC .

  4. A versatile pipeline for the multi-scale digital reconstruction and quantitative analysis of 3D tissue architecture

    PubMed Central

    Morales-Navarrete, Hernán; Segovia-Miranda, Fabián; Klukowski, Piotr; Meyer, Kirstin; Nonaka, Hidenori; Marsico, Giovanni; Chernykh, Mikhail; Kalaidzidis, Alexander; Zerial, Marino; Kalaidzidis, Yannis

    2015-01-01

    A prerequisite for the systems biology analysis of tissues is an accurate digital three-dimensional reconstruction of tissue structure based on images of markers covering multiple scales. Here, we designed a flexible pipeline for the multi-scale reconstruction and quantitative morphological analysis of tissue architecture from microscopy images. Our pipeline includes newly developed algorithms that address specific challenges of thick dense tissue reconstruction. Our implementation allows for a flexible workflow, scalable to high-throughput analysis and applicable to various mammalian tissues. We applied it to the analysis of liver tissue and extracted quantitative parameters of sinusoids, bile canaliculi and cell shapes, recognizing different liver cell types with high accuracy. Using our platform, we uncovered an unexpected zonation pattern of hepatocytes with different size, nuclei and DNA content, thus revealing new features of liver tissue organization. The pipeline also proved effective to analyse lung and kidney tissue, demonstrating its generality and robustness. DOI: http://dx.doi.org/10.7554/eLife.11214.001 PMID:26673893

  5. Live HDR video streaming on commodity hardware

    NASA Astrophysics Data System (ADS)

    McNamee, Joshua; Hatchett, Jonathan; Debattista, Kurt; Chalmers, Alan

    2015-09-01

    High Dynamic Range (HDR) video provides a step change in viewing experience, for example the ability to clearly see the soccer ball when it is kicked from the shadow of the stadium into sunshine. To achieve the full potential of HDR video, so-called true HDR, it is crucial that all the dynamic range that was captured is delivered to the display device and tone mapping is confined only to the display. Furthermore, to ensure widespread uptake of HDR imaging, it should be low cost and available on commodity hardware. This paper describes an end-to-end HDR pipeline for capturing, encoding and streaming high-definition HDR video in real-time using off-the-shelf components. All the lighting that is captured by HDR-enabled consumer cameras is delivered via the pipeline to any display, including HDR displays and even mobile devices with minimum latency. The system thus provides an integrated HDR video pipeline that includes everything from capture to post-production, archival and storage, compression, transmission, and display.

  6. Changes in the Pipeline Transportation Market

    EIA Publications

    1999-01-01

    This analysis assesses the amount of capacity that may be turned back to pipeline companies, based on shippers' actions over the past several years and the profile of contracts in place as of July 1, 1998. It also examines changes in the characteristics of contracts between shippers and pipeline companies.

  7. Validation of Coevolving Residue Algorithms via Pipeline Sensitivity Analysis: ELSC and OMES and ZNMI, Oh My!

    PubMed Central

    Brown, Christopher A.; Brown, Kevin S.

    2010-01-01

    Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions. PMID:20531955

  8. Renewing the Principal Pipeline

    ERIC Educational Resources Information Center

    Turnbull, Brenda J.

    2015-01-01

    The work principals do has always mattered, but as the demands of the job increase, it matters even more. Perhaps once they could maintain safety and order and call it a day, but no longer. Successful principals today must also lead instruction and nurture a productive learning community for students, teachers, and staff. They set the tone for the…

  9. Program for At-Risk Students Helps College, Too

    ERIC Educational Resources Information Center

    Carlson, Scott

    2012-01-01

    The author introduces a new program that brings city kids who really need college to a private rural campus that really needs kids. Under the program, called Pipelines Into Partnership, a handful of urban high schools and community organizations--the groups that know their kids beyond the black and white of their transcripts--determine which…

  10. SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data

    PubMed Central

    Fischer, Maria; Snajder, Rene; Pabinger, Stephan; Dander, Andreas; Schossig, Anna; Zschocke, Johannes; Trajanoski, Zlatko; Stocker, Gernot

    2012-01-01

    In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome. PMID:22870267

  11. affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling.

    PubMed

    Hernandez-Ferrer, Carles; Quintela Garcia, Ines; Danielski, Katharina; Carracedo, Ángel; Pérez-Jurado, Luis A; González, Juan R

    2015-05-20

    The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered.

  12. Understanding gene functions and disease mechanisms: Phenotyping pipelines in the German Mouse Clinic.

    PubMed

    Fuchs, Helmut; Aguilar-Pimentel, Juan Antonio; Amarie, Oana V; Becker, Lore; Calzada-Wack, Julia; Cho, Yi-Li; Garrett, Lillian; Hölter, Sabine M; Irmler, Martin; Kistler, Martin; Kraiger, Markus; Mayer-Kuckuk, Philipp; Moreth, Kristin; Rathkolb, Birgit; Rozman, Jan; da Silva Buttkus, Patricia; Treise, Irina; Zimprich, Annemarie; Gampe, Kristine; Hutterer, Christine; Stöger, Claudia; Leuchtenberger, Stefanie; Maier, Holger; Miller, Manuel; Scheideler, Angelika; Wu, Moya; Beckers, Johannes; Bekeredjian, Raffi; Brielmeier, Markus; Busch, Dirk H; Klingenspor, Martin; Klopstock, Thomas; Ollert, Markus; Schmidt-Weber, Carsten; Stöger, Tobias; Wolf, Eckhard; Wurst, Wolfgang; Yildirim, Ali Önder; Zimmer, Andreas; Gailus-Durner, Valérie; Hrabě de Angelis, Martin

    2017-09-29

    Since decades, model organisms have provided an important approach for understanding the mechanistic basis of human diseases. The German Mouse Clinic (GMC) was the first phenotyping facility that established a collaboration-based platform for phenotype characterization of mouse lines. In order to address individual projects by a tailor-made phenotyping strategy, the GMC advanced in developing a series of pipelines with tests for the analysis of specific disease areas. For a general broad analysis, there is a screening pipeline that covers the key parameters for the most relevant disease areas. For hypothesis-driven phenotypic analyses, there are thirteen additional pipelines with focus on neurological and behavioral disorders, metabolic dysfunction, respiratory system malfunctions, immune-system disorders and imaging techniques. In this article, we give an overview of the pipelines and describe the scientific rationale behind the different test combinations. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. The Kepler Science Data Processing Pipeline Source Code Road Map

    NASA Technical Reports Server (NTRS)

    Wohler, Bill; Jenkins, Jon M.; Twicken, Joseph D.; Bryson, Stephen T.; Clarke, Bruce Donald; Middour, Christopher K.; Quintana, Elisa Victoria; Sanderfer, Jesse Thomas; Uddin, Akm Kamal; Sabale, Anima; hide

    2016-01-01

    We give an overview of the operational concepts and architecture of the Kepler Science Processing Pipeline. Designed, developed, operated, and maintained by the Kepler Science Operations Center (SOC) at NASA Ames Research Center, the Science Processing Pipeline is a central element of the Kepler Ground Data System. The SOC consists of an office at Ames Research Center, software development and operations departments, and a data center which hosts the computers required to perform data analysis. The SOC's charter is to analyze stellar photometric data from the Kepler spacecraft and report results to the Kepler Science Office for further analysis. We describe how this is accomplished via the Kepler Science Processing Pipeline, including, the software algorithms. We present the high-performance, parallel computing software modules of the pipeline that perform transit photometry, pixel-level calibration, systematic error correction, attitude determination, stellar target management, and instrument characterization.

  14. Amateur Image Pipeline Processing using Python plus PyRAF

    NASA Astrophysics Data System (ADS)

    Green, Wayne

    2012-05-01

    A template pipeline spanning observing planning to publishing is offered as a basis for establishing a long term observing program. The data reduction pipeline encapsulates all policy and procedures, providing an accountable framework for data analysis and a teaching framework for IRAF. This paper introduces the technical details of a complete pipeline processing environment using Python, PyRAF and a few other languages. The pipeline encapsulates all processing decisions within an auditable framework. The framework quickly handles the heavy lifting of image processing. It also serves as an excellent teaching environment for astronomical data management and IRAF reduction decisions.

  15. Demonstrating the Effects of Shop Flow Process Variability on the Air Force Depot Level Reparable Item Pipeline

    DTIC Science & Technology

    1992-09-01

    Crawford found that pipeline contents are extremely variable about their mean (10:24) and Kettner and Wheatley said that "a statistical analysis of data...write the results from this replication "* to the ANOVA files for later analysis . The first set outputs points "* for overall pipeline contents . The...families and friends for their unselfishness and support. Marvin A. Arostegui and Jon A. Larvick ii Table of Contents Page Preface

  16. Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data.

    PubMed

    Graña, Osvaldo; López-Fernández, Hugo; Fdez-Riverola, Florentino; González Pisano, David; Glez-Peña, Daniel

    2018-04-15

    High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. ograna@cnio.es or dgpena@uvigo.es. Supplementary data are available at Bioinformatics online.

  17. Maser: one-stop platform for NGS big data from analysis to visualization

    PubMed Central

    Kinjo, Sonoko; Monma, Norikazu; Misu, Sadahiko; Kitamura, Norikazu; Imoto, Junichi; Yoshitake, Kazutoshi; Gojobori, Takashi; Ikeo, Kazuho

    2018-01-01

    Abstract A major challenge in analyzing the data from high-throughput next-generation sequencing (NGS) is how to handle the huge amounts of data and variety of NGS tools and visualize the resultant outputs. To address these issues, we developed a cloud-based data analysis platform, Maser (Management and Analysis System for Enormous Reads), and an original genome browser, Genome Explorer (GE). Maser enables users to manage up to 2 terabytes of data to conduct analyses with easy graphical user interface operations and offers analysis pipelines in which several individual tools are combined as a single pipeline for very common and standard analyses. GE automatically visualizes genome assembly and mapping results output from Maser pipelines, without requiring additional data upload. With this function, the Maser pipelines can graphically display the results output from all the embedded tools and mapping results in a web browser. Therefore Maser realized a more user-friendly analysis platform especially for beginners by improving graphical display and providing the selected standard pipelines that work with built-in genome browser. In addition, all the analyses executed on Maser are recorded in the analysis history, helping users to trace and repeat the analyses. The entire process of analysis and its histories can be shared with collaborators or opened to the public. In conclusion, our system is useful for managing, analyzing, and visualizing NGS data and achieves traceability, reproducibility, and transparency of NGS analysis. Database URL: http://cell-innovation.nig.ac.jp/maser/ PMID:29688385

  18. FUCHS-towards full circular RNA characterization using RNAseq.

    PubMed

    Metge, Franziska; Czaja-Hasse, Lisa F; Reinhardt, Richard; Dieterich, Chistoph

    2017-01-01

    Circular RNAs (circRNAs) belong to a recently re-discovered species of RNA that emerge during RNA maturation through a process called back-splicing. A downstream 5' splice site is linked to an upstream 3' splice site to form a circular transcript instead of a canonical linear transcript. Recent advances in next-generation sequencing (NGS) have brought circRNAs back into the focus of many scientists. Since then, several studies reported that circRNAs are differentially expressed across tissue types and developmental stages, implying that they are actively regulated and not merely a by-product of splicing. Though functional studies have shown that some circRNAs could act as miRNA-sponges, the function of most circRNAs remains unknown. To expand our understanding of possible roles of circular RNAs, we propose a new pipeline that could fully characterizes candidate circRNA structure from RNAseq data-FUCHS: FU ll CH aracterization of circular RNA using RNA- S equencing. Currently, most computational prediction pipelines use back-spliced reads to identify circular RNAs. FUCHS extends this concept by considering all RNA-seq information from long reads (typically >150 bp) to learn more about the exon coverage, the number of double break point fragments, the different circular isoforms arising from one host-gene, and the alternatively spliced exons within the same circRNA boundaries. This new knowledge will enable the user to carry out differential motif enrichment and miRNA seed analysis to determine potential regulators during circRNA biogenesis. FUCHS is an easy-to-use Python based pipeline that contributes a new aspect to the circRNA research.

  19. Seismic fragility formulations for segmented buried pipeline systems including the impact of differential ground subsidence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pineda Porras, Omar Andrey; Ordaz, Mario

    2009-01-01

    Though Differential Ground Subsidence (DGS) impacts the seismic response of segmented buried pipelines augmenting their vulnerability, fragility formulations to estimate repair rates under such condition are not available in the literature. Physical models to estimate pipeline seismic damage considering other cases of permanent ground subsidence (e.g. faulting, tectonic uplift, liquefaction, and landslides) have been extensively reported, not being the case of DGS. The refinement of the study of two important phenomena in Mexico City - the 1985 Michoacan earthquake scenario and the sinking of the city due to ground subsidence - has contributed to the analysis of the interrelation ofmore » pipeline damage, ground motion intensity, and DGS; from the analysis of the 48-inch pipeline network of the Mexico City's Water System, fragility formulations for segmented buried pipeline systems for two DGS levels are proposed. The novel parameter PGV{sup 2}/PGA, being PGV peak ground velocity and PGA peak ground acceleration, has been used as seismic parameter in these formulations, since it has shown better correlation to pipeline damage than PGV alone according to previous studies. By comparing the proposed fragilities, it is concluded that a change in the DGS level (from Low-Medium to High) could increase the pipeline repair rates (number of repairs per kilometer) by factors ranging from 1.3 to 2.0; being the higher the seismic intensity the lower the factor.« less

  20. INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species.

    PubMed

    Kwon, Daehong; Lee, Daehwan; Kim, Juyeon; Lee, Jongin; Sim, Mikang; Kim, Jaebum

    2018-05-09

    Proteins perform biological functions through cascading interactions with each other by forming protein complexes. As a result, interactions among proteins, called protein-protein interactions (PPIs) are not completely free from selection constraint during evolution. Therefore, the identification and analysis of PPI changes during evolution can give us new insight into the evolution of functions. Although many algorithms, databases and websites have been developed to help the study of PPIs, most of them are limited to visualize the structure and features of PPIs in a chosen single species with limited functions in the visualization perspective. This leads to difficulties in the identification of different patterns of PPIs in different species and their functional consequences. To resolve these issues, we developed a web application, called INTER-Species Protein Interaction Analysis (INTERSPIA). Given a set of proteins of user's interest, INTERSPIA first discovers additional proteins that are functionally associated with the input proteins and searches for different patterns of PPIs in multiple species through a server-side pipeline, and second visualizes the dynamics of PPIs in multiple species using an easy-to-use web interface. INTERSPIA is freely available at http://bioinfo.konkuk.ac.kr/INTERSPIA/.

  1. Oman-India pipeline route survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mullee, J.E.

    1995-12-01

    Paper describes the geological setting in the Arabian Sea for a proposed 28-inch gas pipeline from Oman to India reaching 3,500-m water depths. Covers planning, execution, quality control and results of geophysical, geotechnical and oceanographic surveys. Outlines theory and application of pipeline stress analysis on board survey vessel for feasibility assessment, and specifies equipment used.

  2. Risk analysis of urban gas pipeline network based on improved bow-tie model

    NASA Astrophysics Data System (ADS)

    Hao, M. J.; You, Q. J.; Yue, Z.

    2017-11-01

    Gas pipeline network is a major hazard source in urban areas. In the event of an accident, there could be grave consequences. In order to understand more clearly the causes and consequences of gas pipeline network accidents, and to develop prevention and mitigation measures, the author puts forward the application of improved bow-tie model to analyze risks of urban gas pipeline network. The improved bow-tie model analyzes accident causes from four aspects: human, materials, environment and management; it also analyzes the consequences from four aspects: casualty, property loss, environment and society. Then it quantifies the causes and consequences. Risk identification, risk analysis, risk assessment, risk control, and risk management will be clearly shown in the model figures. Then it can suggest prevention and mitigation measures accordingly to help reduce accident rate of gas pipeline network. The results show that the whole process of an accident can be visually investigated using the bow-tie model. It can also provide reasons for and predict consequences of an unfortunate event. It is of great significance in order to analyze leakage failure of gas pipeline network.

  3. Time-Distance Helioseismology Data-Analysis Pipeline for Helioseismic and Magnetic Imager Onboard Solar Dynamics Observatory (SDO-HMI) and Its Initial Results

    NASA Technical Reports Server (NTRS)

    Zhao, J.; Couvidat, S.; Bogart, R. S.; Parchevsky, K. V.; Birch, A. C.; Duvall, Thomas L., Jr.; Beck, J. G.; Kosovichev, A. G.; Scherrer, P. H.

    2011-01-01

    The Helioseismic and Magnetic Imager onboard the Solar Dynamics Observatory (SDO/HMI) provides continuous full-disk observations of solar oscillations. We develop a data-analysis pipeline based on the time-distance helioseismology method to measure acoustic travel times using HMI Doppler-shift observations, and infer solar interior properties by inverting these measurements. The pipeline is used for routine production of near-real-time full-disk maps of subsurface wave-speed perturbations and horizontal flow velocities for depths ranging from 0 to 20 Mm, every eight hours. In addition, Carrington synoptic maps for the subsurface properties are made from these full-disk maps. The pipeline can also be used for selected target areas and time periods. We explain details of the pipeline organization and procedures, including processing of the HMI Doppler observations, measurements of the travel times, inversions, and constructions of the full-disk and synoptic maps. Some initial results from the pipeline, including full-disk flow maps, sunspot subsurface flow fields, and the interior rotation and meridional flow speeds, are presented.

  4. The PREP pipeline: standardized preprocessing for large-scale EEG analysis.

    PubMed

    Bigdely-Shamlo, Nima; Mullen, Tim; Kothe, Christian; Su, Kyung-Min; Robbins, Kay A

    2015-01-01

    The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode.

  5. Social cost impact assessment of pipeline infrastructure projects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matthews, John C., E-mail: matthewsj@battelle.org; Allouche, Erez N., E-mail: allouche@latech.edu; Sterling, Raymond L., E-mail: sterling@latech.edu

    A key advantage of trenchless construction methods compared with traditional open-cut methods is their ability to install or rehabilitate underground utility systems with limited disruption to the surrounding built and natural environments. The equivalent monetary values of these disruptions are commonly called social costs. Social costs are often ignored by engineers or project managers during project planning and design phases, partially because they cannot be calculated using standard estimating methods. In recent years some approaches for estimating social costs were presented. Nevertheless, the cost data needed for validation of these estimating methods is lacking. Development of such social cost databasesmore » can be accomplished by compiling relevant information reported in various case histories. This paper identifies eight most important social cost categories, presents mathematical methods for calculating them, and summarizes the social cost impacts for two pipeline construction projects. The case histories are analyzed in order to identify trends for the various social cost categories. The effectiveness of the methods used to estimate these values is also discussed. These findings are valuable for pipeline infrastructure engineers making renewal technology selection decisions by providing a more accurate process for the assessment of social costs and impacts. - Highlights: • Identified the eight most important social cost factors for pipeline construction • Presented mathematical methods for calculating those social cost factors • Summarized social cost impacts for two pipeline construction projects • Analyzed those projects to identify trends for the social cost factors.« less

  6. MRI-compatible pipeline for three-dimensional MALDI imaging mass spectrometry using PAXgene fixation.

    PubMed

    Oetjen, Janina; Aichler, Michaela; Trede, Dennis; Strehlow, Jan; Berger, Judith; Heldmann, Stefan; Becker, Michael; Gottschalk, Michael; Kobarg, Jan Hendrik; Wirtz, Stefan; Schiffler, Stefan; Thiele, Herbert; Walch, Axel; Maass, Peter; Alexandrov, Theodore

    2013-09-02

    MALDI imaging mass spectrometry (MALDI-imaging) has emerged as a spatially-resolved label-free bioanalytical technique for direct analysis of biological samples and was recently introduced for analysis of 3D tissue specimens. We present a new experimental and computational pipeline for molecular analysis of tissue specimens which integrates 3D MALDI-imaging, magnetic resonance imaging (MRI), and histological staining and microscopy, and evaluate the pipeline by applying it to analysis of a mouse kidney. To ensure sample integrity and reproducible sectioning, we utilized the PAXgene fixation and paraffin embedding and proved its compatibility with MRI. Altogether, 122 serial sections of the kidney were analyzed using MALDI-imaging, resulting in a 3D dataset of 200GB comprised of 2million spectra. We show that elastic image registration better compensates for local distortions of tissue sections. The computational analysis of 3D MALDI-imaging data was performed using our spatial segmentation pipeline which determines regions of distinct molecular composition and finds m/z-values co-localized with these regions. For facilitated interpretation of 3D distribution of ions, we evaluated isosurfaces providing simplified visualization. We present the data in a multimodal fashion combining 3D MALDI-imaging with the MRI volume rendering and with light microscopic images of histologically stained sections. Our novel experimental and computational pipeline for 3D MALDI-imaging can be applied to address clinical questions such as proteomic analysis of the tumor morphologic heterogeneity. Examining the protein distribution as well as the drug distribution throughout an entire tumor using our pipeline will facilitate understanding of the molecular mechanisms of carcinogenesis. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

    PubMed Central

    Teodoro, George; Pan, Tony; Kurc, Tahsin M.; Kong, Jun; Cooper, Lee A. D.; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H.

    2014-01-01

    Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system. PMID:25419546

  8. PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

    PubMed

    Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

    2018-05-01

    The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.

  9. Quantitative analysis of factors that affect oil pipeline network accident based on Bayesian networks: A case study in China

    NASA Astrophysics Data System (ADS)

    Zhang, Chao; Qin, Ting Xin; Huang, Shuai; Wu, Jian Song; Meng, Xin Yan

    2018-06-01

    Some factors can affect the consequences of oil pipeline accident and their effects should be analyzed to improve emergency preparation and emergency response. Although there are some qualitative analysis models of risk factors' effects, the quantitative analysis model still should be researched. In this study, we introduce a Bayesian network (BN) model of risk factors' effects analysis in an oil pipeline accident case that happened in China. The incident evolution diagram is built to identify the risk factors. And the BN model is built based on the deployment rule for factor nodes in BN and the expert knowledge by Dempster-Shafer evidence theory. Then the probabilities of incident consequences and risk factors' effects can be calculated. The most likely consequences given by this model are consilient with the case. Meanwhile, the quantitative estimations of risk factors' effects may provide a theoretical basis to take optimal risk treatment measures for oil pipeline management, which can be used in emergency preparation and emergency response.

  10. Semi-Automatic Segmentation Software for Quantitative Clinical Brain Glioblastoma Evaluation

    PubMed Central

    Zhu, Y; Young, G; Xue, Z; Huang, R; You, H; Setayesh, K; Hatabu, H; Cao, F; Wong, S.T.

    2012-01-01

    Rationale and Objectives Quantitative measurement provides essential information about disease progression and treatment response in patients with Glioblastoma multiforme (GBM). The goal of this paper is to present and validate a software pipeline for semi-automatic GBM segmentation, called AFINITI (Assisted Follow-up in NeuroImaging of Therapeutic Intervention), using clinical data from GBM patients. Materials and Methods Our software adopts the current state-of-the-art tumor segmentation algorithms and combines them into one clinically usable pipeline. Both the advantages of the traditional voxel-based and the deformable shape-based segmentation are embedded into the software pipeline. The former provides an automatic tumor segmentation scheme based on T1- and T2-weighted MR brain data, and the latter refines the segmentation results with minimal manual input. Results Twenty six clinical MR brain images of GBM patients were processed and compared with manual results. The results can be visualized using the embedded graphic user interface (GUI). Conclusion Validation results using clinical GBM data showed high correlation between the AFINITI results and manual annotation. Compared to the voxel-wise segmentation, AFINITI yielded more accurate results in segmenting the enhanced GBM from multimodality MRI data. The proposed pipeline could be used as additional information to interpret MR brain images in neuroradiology. PMID:22591720

  11. Sub-soil contamination due to oil spills in zones surrounding oil pipeline-pump stations and oil pipeline right-of-ways in Southwest-Mexico.

    PubMed

    Iturbe, Rosario; Flores, Carlos; Castro, Alejandrina; Torres, Luis G

    2007-10-01

    Oil spills due to oil pipelines is a very frequent problem in Mexico. Petroleos Mexicanos (PEMEX), very concerned with the environmental agenda, has been developing inspection and correction plans for zones around oil pipelines pumping stations and pipeline right-of-way. These stations are located at regular intervals of kilometres along the pipelines. In this study, two sections of an oil pipeline and two pipeline pumping stations zones are characterized in terms of the presence of Total Petroleum Hydrocarbons (TPHs) and Polycyclic Aromatic Hydrocarbons (PAHs). The study comprehends sampling of the areas, delimitation of contamination in the vertical and horizontal extension, analysis of the sampled soils regarding TPHs content and, in some cases, the 16 PAHs considered as priority by USEPA, calculation of areas and volumes contaminated (according to Mexican legislation, specifically NOM-EM-138-ECOL-2002) and, finally, a proposal for the best remediation techniques suitable for the contamination levels and the localization of contaminants.

  12. Nebula--a web-server for advanced ChIP-seq data analysis.

    PubMed

    Boeva, Valentina; Lermine, Alban; Barette, Camille; Guillouf, Christel; Barillot, Emmanuel

    2012-10-01

    ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3-5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Nebula is available at http://nebula.curie.fr/ Supplementary data are available at Bioinformatics online.

  13. Mathematical simulation for compensation capacities area of pipeline routes in ship systems

    NASA Astrophysics Data System (ADS)

    Ngo, G. V.; Sakhno, K. N.

    2018-05-01

    In this paper, the authors considered the problem of manufacturability’s enhancement of ship systems pipeline at the designing stage. The analysis of arrangements and possibilities for compensation of deviations for pipeline routes has been carried out. The task was set to produce the “fit pipe” together with the rest of the pipes in the route. It was proposed to compensate for deviations by movement of the pipeline route during pipe installation and to calculate maximum values of these displacements in the analyzed path. Theoretical bases of deviation compensation for pipeline routes using rotations of parallel section pairs of pipes are assembled. Mathematical and graphical simulations of compensation area capacities of pipeline routes with various configurations are completed. Prerequisites have been created for creating an automated program that will allow one to determine values of the compensatory capacities area for pipeline routes and to assign quantities of necessary allowances.

  14. Influence of Anchoring on Burial Depth of Submarine Pipelines

    PubMed Central

    Zhuang, Yuan; Li, Yang; Su, Wei

    2016-01-01

    Since the beginning of the twenty-first century, there has been widespread construction of submarine oil-gas transmission pipelines due to an increase in offshore oil exploration. Vessel anchoring operations are causing more damage to submarine pipelines due to shipping transportation also increasing. Therefore, it is essential that the influence of anchoring on the required burial depth of submarine pipelines is determined. In this paper, mathematical models for ordinary anchoring and emergency anchoring have been established to derive an anchor impact energy equation for each condition. The required effective burial depth for submarine pipelines has then been calculated via an energy absorption equation for the protection layer covering the submarine pipelines. Finally, the results of the model calculation have been verified by accident case analysis, and the impact of the anchoring height, anchoring water depth and the anchor weight on the required burial depth of submarine pipelines has been further analyzed. PMID:27166952

  15. 77 FR 66454 - Gulf LNG Liquefaction Company, LLC; Application for Long-Term Authorization To Export Liquefied...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-11-05

    ... integrated U.S. natural gas pipeline system. GLLC notes that due to the Gulf LNG Terminal's direct access to multiple major interstate pipelines and indirect access to the national gas pipeline grid, the Project's... possible impacts that the Export Project might have on natural gas supply and pricing. Navigant's analysis...

  16. Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

    Treesearch

    Jonathan M. Palmer; Michelle A. Jusino; Mark T. Banik; Daniel L. Lindner

    2018-01-01

    High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer...

  17. 77 FR 26760 - Kinder Morgan, Inc.; Analysis of Proposed Agreement Containing Consent Orders To Aid Public Comment

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-07

    ... to as natural gas liquids or NGLs. Interstate pipelines have a limit on how much NGLs natural gas can... gas processing plant to remove those liquids before it can be transported on interstate pipelines... Gas Transmission, and Trailblazer pipelines, as well as associated processing and storage capacity. On...

  18. Gender Equality in the Academy: The Pipeline Problem

    ERIC Educational Resources Information Center

    Monroe, Kristen Renwick; Chiu, William F.

    2010-01-01

    As part of the ongoing work by the Committee on the Status of Women in the Profession (CSWP), we offer an empirical analysis of the pipeline problem in academia. The image of a pipeline is a commonly advanced explanation for persistent discrimination that suggests that gender inequality will decline once there are sufficient numbers of qualified…

  19. Automatic welding detection by an intelligent tool pipe inspection

    NASA Astrophysics Data System (ADS)

    Arizmendi, C. J.; Garcia, W. L.; Quintero, M. A.

    2015-07-01

    This work provide a model based on machine learning techniques in welds recognition, based on signals obtained through in-line inspection tool called “smart pig” in Oil and Gas pipelines. The model uses a signal noise reduction phase by means of pre-processing algorithms and attribute-selection techniques. The noise reduction techniques were selected after a literature review and testing with survey data. Subsequently, the model was trained using recognition and classification algorithms, specifically artificial neural networks and support vector machines. Finally, the trained model was validated with different data sets and the performance was measured with cross validation and ROC analysis. The results show that is possible to identify welding automatically with an efficiency between 90 and 98 percent.

  20. CRLH-TL Sensors for Flow Inhomogeneties Detection of Pneumatic Conveyed Pulverized Solids

    NASA Astrophysics Data System (ADS)

    Angelovski, Aleksandar; Penirschke, Andreas; Jakoby, Rolf

    2011-08-01

    This paper presents an application of a Composite Right/Left-Handed (CRLH) Transmission Line resonator for a compact mass flow detector which is able to detect inhomogeneous flows. In this concept, series capacitors and shunt inductors are used to synthesize a medium with simultaneously negative permeability and permittivity - the so called metamaterial. The helix shape of the cylindrical CRLH-TL sensor offers the possibility to detect flow inhomogeneities within the pipeline which can be used to correct the detected massflow rate. A combination of two CRLH-TL structures within the same cross-section of the pipeline can improve the angular sensitivity of the sensor. A prototype was realized and tested in a dedicated measurement setup to prove the concept.

  1. Heterozygous Mapping Strategy (HetMappS) for High Resolution Genotyping-By-Sequencing Markers: A Case Study in Grapevine

    PubMed Central

    Wang, Minghui; Londo, Jason P.; Acharya, Charlotte B.; Mitchell, Sharon E.; Sun, Qi; Reisch, Bruce; Cadle-Davidson, Lance

    2015-01-01

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low genotyping cost, but for highly heterozygous species, missing data and heterozygote undercalling complicate the creation of GBS genetic maps. To overcome these issues, we developed a publicly available, modular approach called HetMappS, which functions independently of parental genotypes and corrects for genotyping errors associated with heterozygosity. For linkage group formation, HetMappS includes both a reference-guided synteny pipeline and a reference-independent de novo pipeline. The de novo pipeline can be utilized for under-characterized or high diversity families that lack an appropriate reference. We applied both HetMappS pipelines in five half-sib F1 families involving genetically diverse Vitis spp. Starting with at least 116,466 putative SNPs per family, the HetMappS pipelines identified 10,440 to 17,267 phased pseudo-testcross (Pt) markers and generated high-confidence maps. Pt marker density exceeded crossover resolution in all cases; up to 5,560 non-redundant markers were used to generate parental maps ranging from 1,047 cM to 1,696 cM. The number of markers used was strongly correlated with family size in both de novo and synteny maps (r = 0.92 and 0.91, respectively). Comparisons between allele and tag frequencies suggested that many markers were in tandem repeats and mapped as single loci, while markers in regions of more than two repeats were removed during map curation. Both pipelines generated similar genetic maps, and genetic order was strongly correlated with the reference genome physical order in all cases. Independently created genetic maps from shared parents exhibited nearly identical results. Flower sex was mapped in three families and correctly localized to the known sex locus in all cases. The HetMappS pipeline could have wide application for genetic mapping in highly heterozygous species, and its modularity provides opportunities to adapt portions of the pipeline to other family types, genotyping technologies or applications. PMID:26244767

  2. Beyond Standing Rock: Seeking Solutions and Building Awareness at Tribal Colleges

    ERIC Educational Resources Information Center

    Paskus, Laura

    2017-01-01

    People around the world watched scenes unfold at Standing Rock as Indigenous people and their allies protested against the Dakota Access Pipeline (DAPL). One of the men at the center of all of this has been Standing Rock tribal chairman Dave Archambault II. Interviewed time and again on radio and television, Archambault called for prayer and…

  3. [Comparison of gut microbiotal compositional analysis of patients with irritable bowel syndrome through different bioinformatics pipelines].

    PubMed

    Zhu, S W; Liu, Z J; Li, M; Zhu, H Q; Duan, L P

    2018-04-18

    To assess whether the same biological conclusion, diagnostic or curative effects regarding microbial composition of irritable bowel syndrome (IBS) patients could be reached through different bioinformatics pipelines, we used two common bioinformatics pipelines (Uparse V2.0 and Mothur V1.39.5)to analyze the same fecal microbial 16S rRNA high-throughput sequencing data. The two pipelines were used to analyze the diversity and richness of fecal microbial 16S rRNA high-throughput sequencing data of 27 samples, including 9 healthy controls (HC group), 9 diarrhea IBS patients before (IBS group) and after Rifaximin treatment (IBS-treatment, IBSt group). Analyses such as microbial diversity, principal co-ordinates analysis (PCoA), nonmetric multidimensional scaling (NMDS) and linear discriminant analysis effect size (LEfSe) were used to find out the microbial differences among HC group vs. IBS group and IBS group vs. IBSt group. (1) Microbial composition comparison of the 27 samples in the two pipelines showed significant variations at both family and genera levels while no significant variations at phylum level; (2) There was no significant difference in the comparison of HC vs. IBS or IBS vs. IBSt (Uparse: HC vs. IBS, F=0.98, P=0.445; IBS vs. IBSt, F=0.47,P=0.926; Mothur: HC vs.IBS, F=0.82, P=0.646; IBS vs. IBSt, F=0.37, P=0.961). The Shannon index was significantly decreased in IBSt; (3) Both workshops distinguished the significantly enriched genera between HC and IBS groups. For example, Nitrosomonas and Paraprevotella increased while Pseudoalteromonadaceae and Anaerotruncus decreased in HC group through Uparse pipeline, nevertheless Roseburia 62 increased while Butyricicoccus and Moraxellaceae decreased in HC group through Mothur pipeline.Only Uparse pipeline could pick out significant genera between IBS and IBSt, such as Pseudobutyricibrio, Clostridiaceae 1 and Clostridiumsensustricto 1. There were taxonomic and phylogenetic diversity differences between the two pipelines, Mothur can get more taxonomic details because the count number of each taxonomic level is higher. Both pipelines could distinguish the significantly enriched genera between HC and IBS groups, but Uparse was more capable to identity the difference between IBS and IBSt groups. To increase the reproducibility and reliability and to retain the consistency among similar studies, it is very important to consider the impact on different pipelines.

  4. CFD analysis of onshore oil pipelines in permafrost

    NASA Astrophysics Data System (ADS)

    Nardecchia, Fabio; Gugliermetti, Luca; Gugliermetti, Franco

    2017-07-01

    Underground pipelines are built all over the world and the knowledge of their thermal interaction with the soil is crucial for their design. This paper studies the "thermal influenced zone" produced by a buried pipeline and the parameters that can influence its extension by 2D-steady state CFD simulations with the aim to improve the design of new pipelines in permafrost. In order to represent a real case, the study is referred to the Eastern Siberia-Pacific Ocean Oil Pipeline at the three stations of Mo'he, Jiagedaqi and Qiqi'har. Different burial depth sand diameters of the pipe are analyzed; the simulation results show that the effect of the oil pipeline diameter on the thermal field increases with the increase of the distance from the starting station.

  5. Oman India Pipeline: An operational repair strategy based on a rational assessment of risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    German, P.

    1996-12-31

    This paper describes the development of a repair strategy for the operational phase of the Oman India Pipeline based upon the probability and consequences of a pipeline failure. Risk analyses and cost benefit analyses performed provide guidance on the level of deepwater repair development effort appropriate for the Oman India Pipeline project and identifies critical areas toward which more intense development effort should be directed. The risk analysis results indicate that the likelihood of a failure of the Oman India Pipeline during its 40-year life is low. Furthermore, the probability of operational failure of the pipeline in deepwater regions ismore » extremely low, the major proportion of operational failure risk being associated with the shallow water regions.« less

  6. Assessing the hodgepodge of non-mapped reads in bacterial transcriptomes: real or artifactual RNA chimeras?

    PubMed

    Lloréns-Rico, Verónica; Serrano, Luis; Lluch-Senar, Maria

    2014-07-29

    RNA sequencing methods have already altered our view of the extent and complexity of bacterial and eukaryotic transcriptomes, revealing rare transcript isoforms (circular RNAs, RNA chimeras) that could play an important role in their biology. We performed an analysis of chimera formation by four different computational approaches, including a custom designed pipeline, to study the transcriptomes of M. pneumoniae and P. aeruginosa, as well as mixtures of both. We found that rare transcript isoforms detected by conventional pipelines of analysis could be artifacts of the experimental procedure used in the library preparation, and that they are protocol-dependent. By using a customized pipeline we show that optimal library preparation protocol and the pipeline to analyze the results are crucial to identify real chimeric RNAs.

  7. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    PubMed

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  8. Using steady-state equations for transient flow calculation in natural gas pipelines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maddox, R.N.; Zhou, P.

    1984-04-02

    Maddox and Zhou have extended their technique for calculating the unsteady-state behavior of straight gas pipelines to complex pipeline systems and networks. After developing the steady-state flow rate and pressure profile for each pipe in the network, analysts can perform the transient-state analysis in the real-time step-wise manner described for this technique.

  9. A Critique of the STEM Pipeline: Young People's Identities in Sweden and Science Education Policy

    ERIC Educational Resources Information Center

    Mendick, Heather; Berge, Maria; Danielsson, Anna

    2017-01-01

    In this article, we develop critiques of the pipeline model which dominates Western science education policy, using discourse analysis of interviews with two Swedish young women focused on "identity work". We argue that it is important to unpack the ways that the pipeline model fails to engage with intersections of gender, ethnicity,…

  10. A Mitigation Process for Impacts of the All American Pipeline on Oak Woodlands in Santa Barbara County

    Treesearch

    Germaine Reyes-French; Timothy J. Cohen

    1991-01-01

    This paper outlines a mitigation program for pipeline construction impacts to oak tree habitat by describing the requirements for the Offsite Oak Mitigation Program for the All American Pipeline (AAPL) in Santa Barbara County, California. After describing the initial environmental analysis, the County regulatory structure is described under which the plan was required...

  11. Comparative assessment of water use and environmental implications of coal slurry pipelines

    USGS Publications Warehouse

    Palmer, Richard N.; James II, I. C.; Hirsch, R.M.

    1977-01-01

    With other studies conducted by the U.S. Geological Survey of water use in the conversion and transportation of the West 's coal, an analysis of water use and environmental implications of coal-slurry pipeline transport is presented. Simulations of a hypothetical slurry pipeline of 1000-mile length transporting 12.5 million tons per year indicate that pipeline costs and energy requirements are quite sensitive to the coal-to-water ratio. For realistic water prices, the optimal ratio will not vary far from the 50/50 ratio by weight. In comparison to other methods of energy conversion and transport, coal-slurry pipeline utilize about 1/3 the amount of water required for coal gasification, and about 1/5 the amount required for on-site electrical generation. An analysis of net energy output from operating alternative energy transportation systems for the assumed conditions indicates that both slurry pipeline and rail shipment require approximately 4.5 percent of the potential electrical energy output of the coal transported, and high-voltage, direct-current transportation requires approximately 6.5 percent. The environmental impacts of the different transports options are so substantially different that a common basis for comparison does not exist. (Woodard-USGS)

  12. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  13. The PREP pipeline: standardized preprocessing for large-scale EEG analysis

    PubMed Central

    Bigdely-Shamlo, Nima; Mullen, Tim; Kothe, Christian; Su, Kyung-Min; Robbins, Kay A.

    2015-01-01

    The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode. PMID:26150785

  14. CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

    PubMed Central

    2013-01-01

    Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270

  15. Magnetic Flux Leakage and Principal Component Analysis for metal loss approximation in a pipeline

    NASA Astrophysics Data System (ADS)

    Ruiz, M.; Mujica, L. E.; Quintero, M.; Florez, J.; Quintero, S.

    2015-07-01

    Safety and reliability of hydrocarbon transportation pipelines represent a critical aspect for the Oil an Gas industry. Pipeline failures caused by corrosion, external agents, among others, can develop leaks or even rupture, which can negatively impact on population, natural environment, infrastructure and economy. It is imperative to have accurate inspection tools traveling through the pipeline to diagnose the integrity. In this way, over the last few years, different techniques under the concept of structural health monitoring (SHM) have continuously been in development. This work is based on a hybrid methodology that combines the Magnetic Flux Leakage (MFL) and Principal Components Analysis (PCA) approaches. The MFL technique induces a magnetic field in the pipeline's walls. The data are recorded by sensors measuring leakage magnetic field in segments with loss of metal, such as cracking, corrosion, among others. The data provide information of a pipeline with 15 years of operation approximately, which transports gas, has a diameter of 20 inches and a total length of 110 km (with several changes in the topography). On the other hand, PCA is a well-known technique that compresses the information and extracts the most relevant information facilitating the detection of damage in several structures. At this point, the goal of this work is to detect and localize critical loss of metal of a pipeline that are currently working.

  16. Immersion probe arrays for rapid pipeline weld inspection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lebsack, S.; Heckhauser, H.

    In 1992, F.H. Gottfeld, Herne, Germany, a member of the SGA Group (Societe Generale de Surveillance) and Krautkramer Branson, Koin, undertook production of a rapid automated ultrasonic testing (UT) system to inspect manually and machine welded pipeline girth welds. The result of the project is a system called MIPA, or multiple immersion probe array. The advantages of using UT to detect certain weld defects have been realized for many years, however for some applications the time required for UT has been a limiting factor. Where time has not been a factor, automated ultrasonic technology has advanced a reliable solution tomore » many inspection problems across a broad industrial base. The recent past has seen the entrance of automated ultrasonic technology into the harsh and demanding environment of pipelay operations, However, the use of these systems has been focused on automated welding processes. Their effectiveness for manual pipeline welding inspection is contested. This is due to the infinite variability of the joint alignment and shape that is unavoidable even when highly skilled welders are used.« less

  17. Abnormal plasma DNA profiles in early ovarian cancer using a non-invasive prenatal testing platform: implications for cancer screening.

    PubMed

    Cohen, Paul A; Flowers, Nicola; Tong, Stephen; Hannan, Natalie; Pertile, Mark D; Hui, Lisa

    2016-08-24

    Non-invasive prenatal testing (NIPT) identifies fetal aneuploidy by sequencing cell-free DNA in the maternal plasma. Pre-symptomatic maternal malignancies have been incidentally detected during NIPT based on abnormal genomic profiles. This low coverage sequencing approach could have potential for ovarian cancer screening in the non-pregnant population. Our objective was to investigate whether plasma DNA sequencing with a clinical whole genome NIPT platform can detect early- and late-stage high-grade serous ovarian carcinomas (HGSOC). This is a case control study of prospectively-collected biobank samples comprising preoperative plasma from 32 women with HGSOC (16 'early cancer' (FIGO I-II) and 16 'advanced cancer' (FIGO III-IV)) and 32 benign controls. Plasma DNA from cases and controls were sequenced using a commercial NIPT platform and chromosome dosage measured. Sequencing data were blindly analyzed with two methods: (1) Subchromosomal changes were called using an open source algorithm WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR). Genomic gains or losses ≥ 15 Mb were prespecified as "screen positive" calls, and mapped to recurrent copy number variations reported in an ovarian cancer genome atlas. (2) Selected whole chromosome gains or losses were reported using the routine NIPT pipeline for fetal aneuploidy. We detected 13/32 cancer cases using the subchromosomal analysis (sensitivity 40.6 %, 95 % CI, 23.7-59.4 %), including 6/16 early and 7/16 advanced HGSOC cases. Two of 32 benign controls had subchromosomal gains ≥ 15 Mb (specificity 93.8 %, 95 % CI, 79.2-99.2 %). Twelve of the 13 true positive cancer cases exhibited specific recurrent changes reported in HGSOC tumors. The NIPT pipeline resulted in one "monosomy 18" call from the cancer group, and two "monosomy X" calls in the controls. Low coverage plasma DNA sequencing used for prenatal testing detected 40.6 % of all HGSOC, including 38 % of early stage cases. Our findings demonstrate the potential of a high throughput sequencing platform to screen for early HGSOC in plasma based on characteristic multiple segmental chromosome gains and losses. The performance of this approach may be further improved by refining bioinformatics algorithms and targeting selected cancer copy number variations.

  18. BigDataScript: a scripting language for data pipelines.

    PubMed

    Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

    2015-01-01

    The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. © The Author 2014. Published by Oxford University Press.

  19. BigDataScript: a scripting language for data pipelines

    PubMed Central

    Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

    2015-01-01

    Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com PMID:25189778

  20. Computational Identification of Tissue-Specific Splicing Regulatory Elements in Human Genes from RNA-Seq Data.

    PubMed

    Badr, Eman; ElHefnawi, Mahmoud; Heath, Lenwood S

    2016-01-01

    Alternative splicing is a vital process for regulating gene expression and promoting proteomic diversity. It plays a key role in tissue-specific expressed genes. This specificity is mainly regulated by splicing factors that bind to specific sequences called splicing regulatory elements (SREs). Here, we report a genome-wide analysis to study alternative splicing on multiple tissues, including brain, heart, liver, and muscle. We propose a pipeline to identify differential exons across tissues and hence tissue-specific SREs. In our pipeline, we utilize the DEXSeq package along with our previously reported algorithms. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, we identified 28,100 differentially used exons across the four tissues. We identified tissue-specific exonic splicing enhancers that overlap with various previously published experimental and computational databases. A complicated exonic enhancer regulatory network was revealed, where multiple exonic enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the exonic enhancers are found to be co-occurring with multiple exonic silencers and vice versa, which demonstrates a complicated relationship between tissue-specific exonic enhancers and silencers.

  1. MeRIP-PF: an easy-to-use pipeline for high-resolution peak-finding in MeRIP-Seq data.

    PubMed

    Li, Yuli; Song, Shuhui; Li, Cuiping; Yu, Jun

    2013-02-01

    RNA modifications, especially methylation of the N(6) position of adenosine (A)-m(6)A, represent an emerging research frontier in RNA biology. With the rapid development of high-throughput sequencing technology, in-depth study of m(6)A distribution and function relevance becomes feasible. However, a robust method to effectively identify m(6)A-modified regions has not been available yet. Here, we present a novel high-efficiency and user-friendly analysis pipeline called MeRIP-PF for the signal identification of MeRIP-Seq data in reference to controls. MeRIP-PF provides a statistical P-value for each identified m(6)A region based on the difference of read distribution when compared to the controls and also calculates false discovery rate (FDR) as a cut off to differentiate reliable m(6)A regions from the background. Furthermore, MeRIP-PF also achieves gene annotation of m(6)A signals or peaks and produce outputs in both XLS and graphical format, which are useful for further study. MeRIP-PF is implemented in Perl and is freely available at http://software.big.ac.cn/MeRIP-PF.html. Copyright © 2013. Production and hosting by Elsevier Ltd.

  2. sfDM: Open-Source Software for Temporal Analysis and Visualization of Brain Tumor Diffusion MR Using Serial Functional Diffusion Mapping.

    PubMed

    Ceschin, Rafael; Panigrahy, Ashok; Gopalakrishnan, Vanathi

    2015-01-01

    A major challenge in the diagnosis and treatment of brain tumors is tissue heterogeneity leading to mixed treatment response. Additionally, they are often difficult or at very high risk for biopsy, further hindering the clinical management process. To overcome this, novel advanced imaging methods are increasingly being adapted clinically to identify useful noninvasive biomarkers capable of disease stage characterization and treatment response prediction. One promising technique is called functional diffusion mapping (fDM), which uses diffusion-weighted imaging (DWI) to generate parametric maps between two imaging time points in order to identify significant voxel-wise changes in water diffusion within the tumor tissue. Here we introduce serial functional diffusion mapping (sfDM), an extension of existing fDM methods, to analyze the entire tumor diffusion profile along the temporal course of the disease. sfDM provides the tools necessary to analyze a tumor data set in the context of spatiotemporal parametric mapping: the image registration pipeline, biomarker extraction, and visualization tools. We present the general workflow of the pipeline, along with a typical use case for the software. sfDM is written in Python and is freely available as an open-source package under the Berkley Software Distribution (BSD) license to promote transparency and reproducibility.

  3. Strain-Based Design Methodology of Large Diameter Grade X80 Linepipe

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lower, Mark D.

    2014-04-01

    Continuous growth in energy demand is driving oil and natural gas production to areas that are often located far from major markets where the terrain is prone to earthquakes, landslides, and other types of ground motion. Transmission pipelines that cross this type of terrain can experience large longitudinal strains and plastic circumferential elongation as the pipeline experiences alignment changes resulting from differential ground movement. Such displacements can potentially impact pipeline safety by adversely affecting structural capacity and leak tight integrity of the linepipe steel. Planning for new long-distance transmission pipelines usually involves consideration of higher strength linepipe steels because theirmore » use allows pipeline operators to reduce the overall cost of pipeline construction and increase pipeline throughput by increasing the operating pressure. The design trend for new pipelines in areas prone to ground movement has evolved over the last 10 years from a stress-based design approach to a strain-based design (SBD) approach to further realize the cost benefits from using higher strength linepipe steels. This report presents an overview of SBD for pipelines subjected to large longitudinal strain and high internal pressure with emphasis on the tensile strain capacity of high-strength microalloyed linepipe steel. The technical basis for this report involved engineering analysis and examination of the mechanical behavior of Grade X80 linepipe steel in both the longitudinal and circumferential directions. Testing was conducted to assess effects on material processing including as-rolled, expanded, and heat treatment processing intended to simulate coating application. Elastic-plastic and low-cycle fatigue analyses were also performed with varying internal pressures. Proposed SBD models discussed in this report are based on classical plasticity theory and account for material anisotropy, triaxial strain, and microstructural damage effects developed from test data. The results are intended to enhance SBD and analysis methods for producing safe and cost effective pipelines capable of accommodating large plastic strains in seismically active arctic areas.« less

  4. Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data

    PubMed Central

    Morris, Tiffany J.; Beck, Stephan

    2015-01-01

    The Illumina HumanMethylation450 BeadChip has become a popular platform for interrogating DNA methylation in epigenome-wide association studies (EWAS) and related projects as well as resource efforts such as the International Cancer Genome Consortium (ICGC) and the International Human Epigenome Consortium (IHEC). This has resulted in an exponential increase of 450k data in recent years and triggered the development of numerous integrated analysis pipelines and stand-alone packages. This review will introduce and discuss the currently most popular pipelines and packages and is particularly aimed at new 450k users. PMID:25233806

  5. 75 FR 23710 - Order Finding That the ICE PG&E Citygate Financial Basis Contract Traded on the...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-04

    ... Pipe Line, LLC, serves as a juncture for 13 different pipelines. These pipelines bring in natural gas... ``hub'' refers to a juncture where two or more natural gas pipelines are connected. Hubs also serve as... analysis of PG&E Citygate natural gas prices showed that 55 percent of the observations were more than 2.5...

  6. Metatranscriptomic analysis of diverse microbial communities reveals core metabolic pathways and microbiome-specific functionality.

    PubMed

    Jiang, Yue; Xiong, Xuejian; Danska, Jayne; Parkinson, John

    2016-01-12

    Metatranscriptomics is emerging as a powerful technology for the functional characterization of complex microbial communities (microbiomes). Use of unbiased RNA-sequencing can reveal both the taxonomic composition and active biochemical functions of a complex microbial community. However, the lack of established reference genomes, computational tools and pipelines make analysis and interpretation of these datasets challenging. Systematic studies that compare data across microbiomes are needed to demonstrate the ability of such pipelines to deliver biologically meaningful insights on microbiome function. Here, we apply a standardized analytical pipeline to perform a comparative analysis of metatranscriptomic data from diverse microbial communities derived from mouse large intestine, cow rumen, kimchi culture, deep-sea thermal vent and permafrost. Sequence similarity searches allowed annotation of 19 to 76% of putative messenger RNA (mRNA) reads, with the highest frequency in the kimchi dataset due to its relatively low complexity and availability of closely related reference genomes. Metatranscriptomic datasets exhibited distinct taxonomic and functional signatures. From a metabolic perspective, we identified a common core of enzymes involved in amino acid, energy and nucleotide metabolism and also identified microbiome-specific pathways such as phosphonate metabolism (deep sea) and glycan degradation pathways (cow rumen). Integrating taxonomic and functional annotations within a novel visualization framework revealed the contribution of different taxa to metabolic pathways, allowing the identification of taxa that contribute unique functions. The application of a single, standard pipeline confirms that the rich taxonomic and functional diversity observed across microbiomes is not simply an artefact of different analysis pipelines but instead reflects distinct environmental influences. At the same time, our findings show how microbiome complexity and availability of reference genomes can impact comprehensive annotation of metatranscriptomes. Consequently, beyond the application of standardized pipelines, additional caution must be taken when interpreting their output and performing downstream, microbiome-specific, analyses. The pipeline used in these analyses along with a tutorial has been made freely available for download from our project website: http://www.compsysbio.org/microbiome .

  7. Automated flow cytometric analysis across large numbers of samples and cell types.

    PubMed

    Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno

    2015-04-01

    Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.

  8. Real-Time Visualization of Network Behaviors for Situational Awareness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Best, Daniel M.; Bohn, Shawn J.; Love, Douglas V.

    Plentiful, complex, and dynamic data make understanding the state of an enterprise network difficult. Although visualization can help analysts understand baseline behaviors in network traffic and identify off-normal events, visual analysis systems often do not scale well to operational data volumes (in the hundreds of millions to billions of transactions per day) nor to analysis of emergent trends in real-time data. We present a system that combines multiple, complementary visualization techniques coupled with in-stream analytics, behavioral modeling of network actors, and a high-throughput processing platform called MeDICi. This system provides situational understanding of real-time network activity to help analysts takemore » proactive response steps. We have developed these techniques using requirements gathered from the government users for which the tools are being developed. By linking multiple visualization tools to a streaming analytic pipeline, and designing each tool to support a particular kind of analysis (from high-level awareness to detailed investigation), analysts can understand the behavior of a network across multiple levels of abstraction.« less

  9. Experimental and Numerical Investigation of Local Scour Around Submarine Piggyback Pipeline Under Steady Current

    NASA Astrophysics Data System (ADS)

    Zhao, Enjin; Shi, Bing; Qu, Ke; Dong, Wenbin; Zhang, Jing

    2018-04-01

    As a new type of submarine pipeline, the piggyback pipeline has been gradually adopted in engineering practice to enhance the performance and safety of submarine pipelines. However, limited simulation work and few experimental studies have been published on the scour around the piggyback pipeline under steady current. This study numerically and experimentally investigates the local scour of the piggyback pipe under steady current. The influence of prominent factors such as pipe diameter, inflow Reynolds number, and gap between the main and small pipes, on the maximum scour depth have been examined and discussed in detail. Furthermore, one formula to predict the maximum scour depth under the piggyback pipeline has been derived based on the theoretical analysis of scour equilibrium. The feasibility of the proposed formula has been effectively calibrated by both experimental data and numerical results. The findings drawn from this study are instructive in the future design and application of the piggyback pipeline.

  10. Leakage detection in galvanized iron pipelines using ensemble empirical mode decomposition analysis

    NASA Astrophysics Data System (ADS)

    Amin, Makeen; Ghazali, M. Fairusham

    2015-05-01

    There are many numbers of possible approaches to detect leaks. Some leaks are simply noticeable when the liquids or water appears on the surface. However many leaks do not find their way to the surface and the existence has to be check by analysis of fluid flow in the pipeline. The first step is to determine the approximate position of leak. This can be done by isolate the sections of the mains in turn and noting which section causes a drop in the flow. Next approach is by using sensor to locate leaks. This approach are involves strain gauge pressure transducers and piezoelectric sensor. the occurrence of leaks and know its exact location in the pipeline by using specific method which are Acoustic leak detection method and transient method. The objective is to utilize the signal processing technique in order to analyse leaking in the pipeline. With this, an EEMD method will be applied as the analysis method to collect and analyse the data.

  11. Mothering and Professing in the Ivory Tower: A Review of the Literature and a Call for a Research Agenda

    ERIC Educational Resources Information Center

    Eversole, Barbara A. W.; Harvey, Ashley M.; Zimmerman, Toni S.

    2007-01-01

    Although women outnumber men in receiving PhDs, the pipeline to tenure track positions leaks, particularly for mothers. In fact, the leak continues into the granting of tenure and the achievement of promotion. While having children increases a man's chances at attaining tenure and advancement, mothering decreasing a woman's chances. Implications…

  12. Tapping the Principal Pipeline: Identifying Talent for Future School Leadership in the Absence of Formal Succession Management Programs

    ERIC Educational Resources Information Center

    Myung, Jeannie; Loeb, Susanna; Horng, Eileen

    2011-01-01

    Purpose: In light of the difficulty many districts face finding quality principal candidates, this article explores an informal recruitment mechanism of teachers to become principals, which the authors call tapping. The authors assess the extent to which current teachers are being approached by school leaders to consider leadership and whether…

  13. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hura, Greg L.; Menon, Angeli L.; Hammel, Michal

    2009-07-20

    We present an efficient pipeline enabling high-throughput analysis of protein structure in solution with small angle X-ray scattering (SAXS). Our SAXS pipeline combines automated sample handling of microliter volumes, temperature and anaerobic control, rapid data collection and data analysis, and couples structural analysis with automated archiving. We subjected 50 representative proteins, mostly from Pyrococcus furiosus, to this pipeline and found that 30 were multimeric structures in solution. SAXS analysis allowed us to distinguish aggregated and unfolded proteins, define global structural parameters and oligomeric states for most samples, identify shapes and similar structures for 25 unknown structures, and determine envelopes formore » 41 proteins. We believe that high-throughput SAXS is an enabling technology that may change the way that structural genomics research is done.« less

  14. An image processing pipeline to detect and segment nuclei in muscle fiber microscopic images.

    PubMed

    Guo, Yanen; Xu, Xiaoyin; Wang, Yuanyuan; Wang, Yaming; Xia, Shunren; Yang, Zhong

    2014-08-01

    Muscle fiber images play an important role in the medical diagnosis and treatment of many muscular diseases. The number of nuclei in skeletal muscle fiber images is a key bio-marker of the diagnosis of muscular dystrophy. In nuclei segmentation one primary challenge is to correctly separate the clustered nuclei. In this article, we developed an image processing pipeline to automatically detect, segment, and analyze nuclei in microscopic image of muscle fibers. The pipeline consists of image pre-processing, identification of isolated nuclei, identification and segmentation of clustered nuclei, and quantitative analysis. Nuclei are initially extracted from background by using local Otsu's threshold. Based on analysis of morphological features of the isolated nuclei, including their areas, compactness, and major axis lengths, a Bayesian network is trained and applied to identify isolated nuclei from clustered nuclei and artifacts in all the images. Then a two-step refined watershed algorithm is applied to segment clustered nuclei. After segmentation, the nuclei can be quantified for statistical analysis. Comparing the segmented results with those of manual analysis and an existing technique, we find that our proposed image processing pipeline achieves good performance with high accuracy and precision. The presented image processing pipeline can therefore help biologists increase their throughput and objectivity in analyzing large numbers of nuclei in muscle fiber images. © 2014 Wiley Periodicals, Inc.

  15. Education biographies from the science pipeline: An analysis of Latino/a student perspectives on ethnic and gender identity in higher education

    NASA Astrophysics Data System (ADS)

    Lujan, Vanessa Beth

    This study is a qualitative narrative analysis on the importance and relevance of the ethnic and gender identities of 17 Latino/a (Hispanic) college students in the biological sciences. This research study asks the question of how one's higher education experience within the science pipeline shapes an individual's direction of study, attitudes toward science, and cultural/ethnic and gender identity development. By understanding the ideologies of these students, we are able to better comprehend the world-makings that these students bring with them to the learning process in the sciences. Informed by life history narrative analysis, this study examines Latino/as and their persisting involvement within the science pipeline in higher education and is based on qualitative observations and interviews of student perspectives on the importance of the college science experience on their ethnic identity and gender identity. The findings in this study show the multiple interrelationships from both Latino male and Latina female narratives, separate and intersecting, to reveal the complexities of the Latino/a group experience in college science. By understanding from a student perspective how the science pipeline affects one's cultural, ethnic, or gender identity, we can create a thought-provoking discussion on why and how underrepresented student populations persist in the science pipeline in higher education. The conditions created in the science pipeline and how they affect Latino/a undergraduate pathways may further be used to understand and improve the quality of the undergraduate learning experience.

  16. Towards a Fuzzy Bayesian Network Based Approach for Safety Risk Analysis of Tunnel-Induced Pipeline Damage.

    PubMed

    Zhang, Limao; Wu, Xianguo; Qin, Yawei; Skibniewski, Miroslaw J; Liu, Wenli

    2016-02-01

    Tunneling excavation is bound to produce significant disturbances to surrounding environments, and the tunnel-induced damage to adjacent underground buried pipelines is of considerable importance for geotechnical practice. A fuzzy Bayesian networks (FBNs) based approach for safety risk analysis is developed in this article with detailed step-by-step procedures, consisting of risk mechanism analysis, the FBN model establishment, fuzzification, FBN-based inference, defuzzification, and decision making. In accordance with the failure mechanism analysis, a tunnel-induced pipeline damage model is proposed to reveal the cause-effect relationships between the pipeline damage and its influential variables. In terms of the fuzzification process, an expert confidence indicator is proposed to reveal the reliability of the data when determining the fuzzy probability of occurrence of basic events, with both the judgment ability level and the subjectivity reliability level taken into account. By means of the fuzzy Bayesian inference, the approach proposed in this article is capable of calculating the probability distribution of potential safety risks and identifying the most likely potential causes of accidents under both prior knowledge and given evidence circumstances. A case concerning the safety analysis of underground buried pipelines adjacent to the construction of the Wuhan Yangtze River Tunnel is presented. The results demonstrate the feasibility of the proposed FBN approach and its application potential. The proposed approach can be used as a decision tool to provide support for safety assurance and management in tunnel construction, and thus increase the likelihood of a successful project in a complex project environment. © 2015 Society for Risk Analysis.

  17. Pipeline issues

    NASA Technical Reports Server (NTRS)

    Eisley, Joe T.

    1990-01-01

    The declining pool of graduates, the lack of rigorous preparation in science and mathematics, and the declining interest in science and engineering careers at the precollege level promises a shortage of technically educated personnel at the college level for industry, government, and the universities in the next several decades. The educational process, which starts out with a large number of students at the elementary level, but with an ever smaller number preparing for science and engineering at each more advanced educational level, is in a state of crisis. These pipeline issues, so called because the educational process is likened to a series of ever smaller constrictions in a pipe, were examined in a workshop at the Space Grant Conference and a summary of the presentations and the results of the discussion, and the conclusions of the workshop participants are reported.

  18. VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.

    PubMed

    Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W

    2018-04-12

    RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.

  19. A node-wise analysis of the uterine muscle networks for pregnancy monitoring.

    PubMed

    Nader, N; Hassan, M; Falou, W; Marque, C; Khalil, M

    2016-08-01

    The recent past years have seen a noticeable increase of interest in the correlation analysis of electrohysterographic (EHG) signals in the perspective of improving the pregnancy monitoring. Here we propose a new approach based on the functional connectivity between multichannel (4×4 matrix) EHG signals recorded from the women's abdomen. The proposed pipeline includes i) the computation of the statistical couplings between the multichannel EHG signals, ii) the characterization of the connectivity matrices, computed by using the imaginary part of the coherence, based on the graph-theory analysis and iii) the use of these measures for pregnancy monitoring. The method was evaluated on a dataset of EHGs, in order to track the correlation between EHGs collected by each electrode of the matrix (called `node-wise' analysis) and follow their evolution along weeks before labor. Results showed that the strength of each node significantly increases from pregnancy to labor. Electrodes located on the median vertical axis of the uterus seemed to be the more discriminant. We speculate that the network-based analysis can be a very promising tool to improve pregnancy monitoring.

  20. ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

    PubMed

    Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi

    2014-03-01

    Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.

  1. Analysis of ChIP-seq Data in R/Bioconductor.

    PubMed

    de Santiago, Ines; Carroll, Thomas

    2018-01-01

    The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.

  2. Designing Image Analysis Pipelines in Light Microscopy: A Rational Approach.

    PubMed

    Arganda-Carreras, Ignacio; Andrey, Philippe

    2017-01-01

    With the progress of microscopy techniques and the rapidly growing amounts of acquired imaging data, there is an increased need for automated image processing and analysis solutions in biological studies. Each new application requires the design of a specific image analysis pipeline, by assembling a series of image processing operations. Many commercial or free bioimage analysis software are now available and several textbooks and reviews have presented the mathematical and computational fundamentals of image processing and analysis. Tens, if not hundreds, of algorithms and methods have been developed and integrated into image analysis software, resulting in a combinatorial explosion of possible image processing sequences. This paper presents a general guideline methodology to rationally address the design of image processing and analysis pipelines. The originality of the proposed approach is to follow an iterative, backwards procedure from the target objectives of analysis. The proposed goal-oriented strategy should help biologists to better apprehend image analysis in the context of their research and should allow them to efficiently interact with image processing specialists.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dawn Lenz; Raymond T. Lines; Darryl Murdock

    ITT Industries Space Systems Division (Space Systems) has developed an airborne natural gas leak detection system designed to detect, image, quantify, and precisely locate leaks from natural gas transmission pipelines. This system is called the Airborne Natural Gas Emission Lidar (ANGEL) system. The ANGEL system uses a highly sensitive differential absorption Lidar technology to remotely detect pipeline leaks. The ANGEL System is operated from a fixed wing aircraft and includes automatic scanning, pointing system, and pilot guidance systems. During a pipeline inspection, the ANGEL system aircraft flies at an elevation of 1000 feet above the ground at speeds of betweenmore » 100 and 150 mph. Under this contract with DOE/NETL, Space Systems was funded to integrate the ANGEL sensor into a test aircraft and conduct a series of flight tests over a variety of test targets including simulated natural gas pipeline leaks. Following early tests in upstate New York in the summer of 2004, the ANGEL system was deployed to Casper, Wyoming to participate in a set of DOE-sponsored field tests at the Rocky Mountain Oilfield Testing Center (RMOTC). At RMOTC the Space Systems team completed integration of the system and flew an operational system for the first time. The ANGEL system flew 2 missions/day for the duration for the 5-day test. Over the course of the week the ANGEL System detected leaks ranging from 100 to 5,000 scfh.« less

  4. Dynamic Black-Level Correction and Artifact Flagging in the Kepler Data Pipeline

    NASA Technical Reports Server (NTRS)

    Clarke, B. D.; Kolodziejczak, J. J.; Caldwell, D. A.

    2013-01-01

    Instrument-induced artifacts in the raw Kepler pixel data include time-varying crosstalk from the fine guidance sensor (FGS) clock signals, manifestations of drifting moiré pattern as locally correlated nonstationary noise and rolling bands in the images which find their way into the calibrated pixel time series and ultimately into the calibrated target flux time series. Using a combination of raw science pixel data, full frame images, reverse-clocked pixel data and ancillary temperature data the Keplerpipeline models and removes the FGS crosstalk artifacts by dynamically adjusting the black level correction. By examining the residuals to the model fits, the pipeline detects and flags spatial regions and time intervals of strong time-varying blacklevel (rolling bands ) on a per row per cadence basis. These flags are made available to downstream users of the data since the uncorrected rolling band artifacts could complicate processing or lead to misinterpretation of instrument behavior as stellar. This model fitting and artifact flagging is performed within the new stand-alone pipeline model called Dynablack. We discuss the implementation of Dynablack in the Kepler data pipeline and present results regarding the improvement in calibrated pixels and the expected improvement in cotrending performances as a result of including FGS corrections in the calibration. We also discuss the effectiveness of the rolling band flagging for downstream users and illustrate with some affected light curves.

  5. Modeling flows of heterogeneous media in pipelines when substantiating operating conditions of hydrocarbon field transportation systems

    NASA Astrophysics Data System (ADS)

    Dudin, S. M.; Novitskiy, D. V.

    2018-05-01

    The works of researchers at VNIIgaz, Giprovostokneft, Kuibyshev NIINP, Grozny Petroleum Institute, etc., are devoted to modeling heterogeneous medium flows in pipelines under laboratory conditions. In objective consideration, the empirical relationships obtained and the calculation procedures for pipelines transporting multiphase products are a bank of experimental data on the problem of pipeline transportation of multiphase systems. Based on the analysis of the published works, the main design requirements for experimental installations designed to study the flow regimes of gas-liquid flows in pipelines were formulated, which were taken into account by the authors when creating the experimental stand. The article describes the results of experimental studies of the flow regimes of a gas-liquid mixture in a pipeline, and also gives a methodological description of the experimental installation. Also the article describes the software of the experimental scientific and educational stand developed with the participation of the authors.

  6. Literature Review: Theory and Application of In-Line Inspection Technologies for Oil and Gas Pipeline Girth Weld Defection

    PubMed Central

    Feng, Qingshan; Li, Rui; Nie, Baohua; Liu, Shucong; Zhao, Lianyu; Zhang, Hong

    2016-01-01

    Girth weld cracking is one of the main failure modes in oil and gas pipelines; girth weld cracking inspection has great economic and social significance for the intrinsic safety of pipelines. This paper introduces the typical girth weld defects of oil and gas pipelines and the common nondestructive testing methods, and systematically generalizes the progress in the studies on technical principles, signal analysis, defect sizing method and inspection reliability, etc., of magnetic flux leakage (MFL) inspection, liquid ultrasonic inspection, electromagnetic acoustic transducer (EMAT) inspection and remote field eddy current (RFDC) inspection for oil and gas pipeline girth weld defects. Additionally, it introduces the new technologies for composite ultrasonic, laser ultrasonic, and magnetostriction inspection, and provides reference for development and application of oil and gas pipeline girth weld defect in-line inspection technology. PMID:28036016

  7. Mathematical modeling of non-stationary gas flow in gas pipeline

    NASA Astrophysics Data System (ADS)

    Fetisov, V. G.; Nikolaev, A. K.; Lykov, Y. V.; Duchnevich, L. N.

    2018-03-01

    An analysis of the operation of the gas transportation system shows that for a considerable part of time pipelines operate in an unsettled regime of gas movement. Its pressure and flow rate vary along the length of pipeline and over time as a result of uneven consumption and selection, switching on and off compressor units, shutting off stop valves, emergence of emergency leaks. The operational management of such regimes is associated with difficulty of reconciling the operating modes of individual sections of gas pipeline with each other, as well as with compressor stations. Determining the grounds that cause change in the operating mode of the pipeline system and revealing patterns of these changes determine the choice of its parameters. Therefore, knowledge of the laws of changing the main technological parameters of gas pumping through pipelines in conditions of non-stationary motion is of great importance for practice.

  8. DAX - The Next Generation: Towards One Million Processes on Commodity Hardware.

    PubMed

    Damon, Stephen M; Boyd, Brian D; Plassard, Andrew J; Taylor, Warren; Landman, Bennett A

    2017-01-01

    Large scale image processing demands a standardized way of not only storage but also a method for job distribution and scheduling. The eXtensible Neuroimaging Archive Toolkit (XNAT) is one of several platforms that seeks to solve the storage issues. Distributed Automation for XNAT (DAX) is a job control and distribution manager. Recent massive data projects have revealed several bottlenecks for projects with >100,000 assessors (i.e., data processing pipelines in XNAT). In order to address these concerns, we have developed a new API, which exposes a direct connection to the database rather than REST API calls to accomplish the generation of assessors. This method, consistent with XNAT, keeps a full history for auditing purposes. Additionally, we have optimized DAX to keep track of processing status on disk (called DISKQ) rather than on XNAT, which greatly reduces load on XNAT by vastly dropping the number of API calls. Finally, we have integrated DAX into a Docker container with the idea of using it as a Docker controller to launch Docker containers of image processing pipelines. Using our new API, we reduced the time to create 1,000 assessors (a sub-cohort of our case project) from 65040 seconds to 229 seconds (a decrease of over 270 fold). DISKQ, using pyXnat, allows launching of 400 jobs in under 10 seconds which previously took 2,000 seconds. Together these updates position DAX to support projects with hundreds of thousands of scans and to run them in a time-efficient manner.

  9. DAX - the next generation: towards one million processes on commodity hardware

    NASA Astrophysics Data System (ADS)

    Damon, Stephen M.; Boyd, Brian D.; Plassard, Andrew J.; Taylor, Warren; Landman, Bennett A.

    2017-03-01

    Large scale image processing demands a standardized way of not only storage but also a method for job distribution and scheduling. The eXtensible Neuroimaging Archive Toolkit (XNAT) is one of several platforms that seeks to solve the storage issues. Distributed Automation for XNAT (DAX) is a job control and distribution manager. Recent massive data projects have revealed several bottlenecks for projects with <100,000 assessors (i.e., data processing pipelines in XNAT). In order to address these concerns, we have developed a new API, which exposes a direct connection to the database rather than REST API calls to accomplish the generation of assessors. This method, consistent with XNAT, keeps a full history for auditing purposes. Additionally, we have optimized DAX to keep track of processing status on disk (called DISKQ) rather than on XNAT, which greatly reduces load on XNAT by vastly dropping the number of API calls. Finally, we have integrated DAX into a Docker container with the idea of using it as a Docker controller to launch Docker containers of image processing pipelines. Using our new API, we reduced the time to create 1,000 assessors (a sub-cohort of our case project) from 65040 seconds to 229 seconds (a decrease of over 270 fold). DISKQ, using pyXnat, allows launching of 400 jobs in under 10 seconds which previously took 2,000 seconds. Together these updates position DAX to support projects with hundreds of thousands of scans and to run them in a time-efficient manner.

  10. DAX - The Next Generation: Towards One Million Processes on Commodity Hardware

    PubMed Central

    Boyd, Brian D.; Plassard, Andrew J.; Taylor, Warren; Landman, Bennett A.

    2017-01-01

    Large scale image processing demands a standardized way of not only storage but also a method for job distribution and scheduling. The eXtensible Neuroimaging Archive Toolkit (XNAT) is one of several platforms that seeks to solve the storage issues. Distributed Automation for XNAT (DAX) is a job control and distribution manager. Recent massive data projects have revealed several bottlenecks for projects with >100,000 assessors (i.e., data processing pipelines in XNAT). In order to address these concerns, we have developed a new API, which exposes a direct connection to the database rather than REST API calls to accomplish the generation of assessors. This method, consistent with XNAT, keeps a full history for auditing purposes. Additionally, we have optimized DAX to keep track of processing status on disk (called DISKQ) rather than on XNAT, which greatly reduces load on XNAT by vastly dropping the number of API calls. Finally, we have integrated DAX into a Docker container with the idea of using it as a Docker controller to launch Docker containers of image processing pipelines. Using our new API, we reduced the time to create 1,000 assessors (a sub-cohort of our case project) from 65040 seconds to 229 seconds (a decrease of over 270 fold). DISKQ, using pyXnat, allows launching of 400 jobs in under 10 seconds which previously took 2,000 seconds. Together these updates position DAX to support projects with hundreds of thousands of scans and to run them in a time-efficient manner. PMID:28919661

  11. 76 FR 26793 - Pipeline Safety: Request for Special Permit

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-09

    ... certain pipeline safety regulations. The request includes a technical analysis provided by the operator... at 202-366-0113, or e-mail at [email protected] . Technical: Steve Nanney by telephone at 713-628...

  12. Analysis of the strength of sea gas pipelines of positive buoyancy conditioned by glaciation

    NASA Astrophysics Data System (ADS)

    Malkov, Venyamin; Kurbatova, Galina; Ermolaeva, Nadezhda; Malkova, Yulia; Petrukhin, Ruslan

    2018-05-01

    A technique for estimating the stress state of a gas pipeline laid along the seabed in northern latitudes in the presence of glaciation is proposed. It is assumed that the pipeline lies on the bottom of the seabed, but under certain conditions on the some part of the pipeline a glaciation is formed and the gas pipeline section in the place of glaciation can come off the ground due to the positive buoyancy of the ice. Calculation of additional stresses caused by bending of the pipeline is of practical interest for strength evaluation. The gas pipeline is a two-layer cylindrical shell of circular cross section. The inner layer is made of high-strength steel, the outer layer is made of reinforced ferroconcrete. The proposed methodology for calculating the gas pipeline for strength is based on the equations of the theory of shells. The procedure takes into account the effect of internal gas pressure, external pressure of sea water, the weight of two-layer gas pipeline and the weight of the ice layer. The lifting force created by the displaced fluid and the positive buoyancy of the ice is also taken into account. It is significant that the listed loads cause only two types of deformation of the gas pipeline: axisymmetric and antisymmetric. The interaction of the pipeline with the ground as an elastic foundation is not considered. The main objective of the research is to establish the fact of separation of part of the pipeline from the ground. The method of calculations of stresses and deformations occurring in a model sea gas pipeline is presented.

  13. Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.

    PubMed

    Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W

    2018-01-01

    Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. Kepler Science Operations Center Architecture

    NASA Technical Reports Server (NTRS)

    Middour, Christopher; Klaus, Todd; Jenkins, Jon; Pletcher, David; Cote, Miles; Chandrasekaran, Hema; Wohler, Bill; Girouard, Forrest; Gunter, Jay P.; Uddin, Kamal; hide

    2010-01-01

    We give an overview of the operational concepts and architecture of the Kepler Science Data Pipeline. Designed, developed, operated, and maintained by the Science Operations Center (SOC) at NASA Ames Research Center, the Kepler Science Data Pipeline is central element of the Kepler Ground Data System. The SOC charter is to analyze stellar photometric data from the Kepler spacecraft and report results to the Kepler Science Office for further analysis. We describe how this is accomplished via the Kepler Science Data Pipeline, including the hardware infrastructure, scientific algorithms, and operational procedures. The SOC consists of an office at Ames Research Center, software development and operations departments, and a data center that hosts the computers required to perform data analysis. We discuss the high-performance, parallel computing software modules of the Kepler Science Data Pipeline that perform transit photometry, pixel-level calibration, systematic error-correction, attitude determination, stellar target management, and instrument characterization. We explain how data processing environments are divided to support operational processing and test needs. We explain the operational timelines for data processing and the data constructs that flow into the Kepler Science Data Pipeline.

  15. Query-Driven Visualization and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing ofmore » the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.« less

  16. Using Next Generation Sequencing for Multiplexed Trait-Linked Markers in Wheat

    PubMed Central

    Bernardo, Amy; Wang, Shan; St. Amand, Paul; Bai, Guihua

    2015-01-01

    With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat ( Triticum aestivum L.) that can be effectively used in marker-assisted selection (MAS) is still limited and SNP assays for MAS are usually uniplex. A shift from uniplex to multiplex assays will allow the simultaneous analysis of multiple markers and increase MAS efficiency. We designed 33 locus-specific markers from SNP or indel-based marker sequences that linked to 20 different quantitative trait loci (QTL) or genes of agronomic importance in wheat and analyzed the amplicon sequences using an Ion Torrent Proton Sequencer and a custom allele detection pipeline to determine the genotypes of 24 selected germplasm accessions. Among the 33 markers, 27 were successfully multiplexed and 23 had 100% SNP call rates. Results from analysis of "kompetitive allele-specific PCR" (KASP) and sequence tagged site (STS) markers developed from the same loci fully verified the genotype calls of 23 markers. The NGS-based multiplexed assay developed in this study is suitable for rapid and high-throughput screening of SNPs and some indel-based markers in wheat. PMID:26625271

  17. Effect of Cooling Mode on Microstructure and Mechanical Properties of Pipeline Steel for Strain Based Design and Research on its Deformation Mechanism

    NASA Astrophysics Data System (ADS)

    Hesong, Zhang; Yonglin, Kang

    With the rapid development of oil and gas industry long distance pipelines inevitably pass through regions with complex geological activities. In order to avoid large deformation the pipelines must be designed based on strain criteria. In this paper the alloy system of X80 high deformability pipeline steel was designed which was 0.25%Mo-0.05%C-1.75%Mn. The effect of controlled cooling process on microstructure and mechanical properties of X80 high deformability pipeline steel were systematically investigated. Through the two-stage controlled cooling process the microstructure of the X80 high deformability pipeline steel were ferrite, bainite and M/A island. There were two kinds of ferrite which were polygonal ferrite (PF) and quasi-polygonal ferrite (QF). The bainite was granular bainite ferrite (GF). Along with the decrease of the start cooling temperature, the volume fraction of ferrite and M/A both increased, the yield ratio (Y/T) decreased, the uniform elongation (uEl) increased firstly with the content of ferrite increased but then decreased with the content and size of M/A increased. When the finish cooling temperature decreasing, the size of M/A became finer. As the start cooling temperature was 690 °C and the finish cooling temperature was 450 °C the volume fraction of ferrite was 23%, the size of ferrite grain was 5μm, the size of M/A island was below 1μm and the structure uniformity was the best. The deformation mechanism of X80 high deformability pipeline steel was analyzed. The best way to improve the work hardening rate was reducing the size of M/A islands on the premise of a certain volume fraction. The decreasing path of instantaneous strain hardening index (n*-value) showed three stages in the deformation process. The n*-value kept stable in the second stage, the reason was that the retained austenite transformed into martensite and the phase transition improved the strain hardening ability of the microstructure. This phenomenon was called transformation induced plasticity effect (TRIP).

  18. Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

    PubMed

    Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

    2017-05-23

    In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.

  19. DataMed - an open source discovery index for finding biomedical datasets.

    PubMed

    Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua

    2018-01-13

    Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Developing a Comprehensive Risk Assessment Framework for Geological Storage CO 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duncan, Ian

    2014-08-31

    The operational risks for CCS projects include: risks of capturing, compressing, transporting and injecting CO₂; risks of well blowouts; risk that CO 2 will leak into shallow aquifers and contaminate potable water; and risk that sequestered CO 2 will leak into the atmosphere. This report examines these risks by using information on the risks associated with analogue activities such as CO 2 based enhanced oil recovery (CO 2-EOR), natural gas storage and acid gas disposal. We have developed a new analysis of pipeline risk based on Bayesian statistical analysis. Bayesian theory probabilities may describe states of partial knowledge, even perhapsmore » those related to non-repeatable events. The Bayesian approach enables both utilizing existing data and at the same time having the capability to adsorb new information thus to lower uncertainty in our understanding of complex systems. Incident rates for both natural gas and CO 2 pipelines have been widely used in papers and reports on risk of CO 2 pipelines as proxies for the individual risk created by such pipelines. Published risk studies of CO 2 pipelines suggest that the individual risk associated with CO2 pipelines is between 10 -3 and 10 -4, which reflects risk levels approaching those of mountain climbing, which many would find unacceptably high. This report concludes, based on a careful analysis of natural gas pipeline failures, suggests that the individual risk of CO 2 pipelines is likely in the range of 10-6 to 10-7, a risk range considered in the acceptable to negligible range in most countries. If, as is commonly thought, pipelines represent the highest risk component of CCS outside of the capture plant, then this conclusion suggests that most (if not all) previous quantitative- risk assessments of components of CCS may be orders of magnitude to high. The potential lethality of unexpected CO 2 releases from pipelines or wells are arguably the highest risk aspects of CO 2 enhanced oil recovery (CO2-EOR), carbon capture, and storage (CCS). Assertions in the CCS literature, that CO 2 levels of 10% for ten minutes, or 20 to 30% for a few minutes are lethal to humans, are not supported by the available evidence. The results of published experiments with animals exposed to CO 2, from mice to monkeys, at both normal and depleted oxygen levels, suggest that lethal levels of CO 2 toxicity are in the range 50 to 60%. These experiments demonstrate that CO 2 does not kill by asphyxia, but rather is toxic at high concentrations. It is concluded that quantitative risk assessments of CCS have overestimated the risk of fatalities by using values of lethality a factor two to six lower than the values estimated in this paper. In many dispersion models of CO 2 releases from pipelines, no fatalities would be predicted if appropriate levels of lethality for CO 2 had been used in the analysis.« less

  1. Practical Approach for Hyperspectral Image Processing in Python

    NASA Astrophysics Data System (ADS)

    Annala, L.; Eskelinen, M. A.; Hämäläinen, J.; Riihinen, A.; Pölönen, I.

    2018-04-01

    Python is a very popular programming language among data scientists around the world. Python can also be used in hyperspectral data analysis. There are some toolboxes designed for spectral imaging, such as Spectral Python and HyperSpy, but there is a need for analysis pipeline, which is easy to use and agile for different solutions. We propose a Python pipeline which is built on packages xarray, Holoviews and scikit-learn. We have developed some of own tools, MaskAccessor, VisualisorAccessor and a spectral index library. They also fulfill our goal of easy and agile data processing. In this paper we will present our processing pipeline and demonstrate it in practice.

  2. Solvepol: A Reduction Pipeline for Imaging Polarimetry Data

    NASA Astrophysics Data System (ADS)

    Ramírez, Edgar A.; Magalhães, Antônio M.; Davidson, James W., Jr.; Pereyra, Antonio; Rubinho, Marcelo

    2017-05-01

    We present a newly, fully automated, data pipeline, Solvepol, designed to reduce and analyze polarimetric data. It has been optimized for imaging data from the Instituto de Astronomía, Geofísica e Ciências Atmosféricas (IAG) of the University of São Paulo (USP), calcite Savart prism plate-based IAGPOL polarimeter. Solvepol is also the basis of a reduction pipeline for the wide-field optical polarimeter that will execute SOUTH POL, a survey of the polarized southern sky. Solvepol was written using the Interactive data language (IDL) and is based on the Image Reduction and Analysis Facility (IRAF) task PCCDPACK, developed by our polarimetry group. We present and discuss reduced data from standard stars and other fields and compare these results with those obtained in the IRAF environment. Our analysis shows that Solvepol, in addition to being a fully automated pipeline, produces results consistent with those reduced by PCCDPACK and reported in the literature.

  3. Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case.

    PubMed

    Amar, David; Frades, Itziar; Danek, Agnieszka; Goldberg, Tatyana; Sharma, Sanjeev K; Hedley, Pete E; Proux-Wera, Estelle; Andreasson, Erik; Shamir, Ron; Tzfadia, Oren; Alexandersson, Erik

    2014-12-05

    For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, 'omics', and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. We show that the automatic gene annotations of potato have low accuracy when compared to a "gold standard" based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of '-omics' datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper.

  4. Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis

    PubMed Central

    Neerincx, Pieter BT; Casel, Pierrot; Prickett, Dennis; Nie, Haisheng; Watson, Michael; Leunissen, Jack AM; Groenen, Martien AM; Klopp, Christophe

    2009-01-01

    Background Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation. PMID:19615109

  5. Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

    PubMed

    Neerincx, Pieter Bt; Casel, Pierrot; Prickett, Dennis; Nie, Haisheng; Watson, Michael; Leunissen, Jack Am; Groenen, Martien Am; Klopp, Christophe

    2009-07-16

    Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

  6. Decoding of finger trajectory from ECoG using deep learning.

    PubMed

    Xie, Ziqian; Schwartz, Odelia; Prasad, Abhishek

    2018-06-01

    Conventional decoding pipeline for brain-machine interfaces (BMIs) consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes it difficult to make the whole system adaptive. The goal was to create an adaptive online system with a single objective function and a single learning algorithm so that the whole system can be trained in parallel to increase the decoding performance. Here, we used deep neural networks consisting of convolutional neural networks (CNN) and a special kind of recurrent neural network (RNN) called long short term memory (LSTM) to address these needs. We used electrocorticography (ECoG) data collected by Kubanek et al. The task consisted of individual finger flexions upon a visual cue. Our model combined a hierarchical feature extractor CNN and a RNN that was able to process sequential data and recognize temporal dynamics in the neural data. CNN was used as the feature extractor and LSTM was used as the regression algorithm to capture the temporal dynamics of the signal. We predicted the finger trajectory using ECoG signals and compared results for the least angle regression (LARS), CNN-LSTM, random forest, LSTM model (LSTM_HC, for using hard-coded features) and a decoding pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. This study demonstrated a decoding network for BMI that involved a convolutional and recurrent neural network model. It integrated the feature extraction pipeline into the convolution and pooling layer and used LSTM layer to capture the state transitions. The discussed network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning.

  7. Decoding of finger trajectory from ECoG using deep learning

    NASA Astrophysics Data System (ADS)

    Xie, Ziqian; Schwartz, Odelia; Prasad, Abhishek

    2018-06-01

    Objective. Conventional decoding pipeline for brain-machine interfaces (BMIs) consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes it difficult to make the whole system adaptive. The goal was to create an adaptive online system with a single objective function and a single learning algorithm so that the whole system can be trained in parallel to increase the decoding performance. Here, we used deep neural networks consisting of convolutional neural networks (CNN) and a special kind of recurrent neural network (RNN) called long short term memory (LSTM) to address these needs. Approach. We used electrocorticography (ECoG) data collected by Kubanek et al. The task consisted of individual finger flexions upon a visual cue. Our model combined a hierarchical feature extractor CNN and a RNN that was able to process sequential data and recognize temporal dynamics in the neural data. CNN was used as the feature extractor and LSTM was used as the regression algorithm to capture the temporal dynamics of the signal. Main results. We predicted the finger trajectory using ECoG signals and compared results for the least angle regression (LARS), CNN-LSTM, random forest, LSTM model (LSTM_HC, for using hard-coded features) and a decoding pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. Significance. This study demonstrated a decoding network for BMI that involved a convolutional and recurrent neural network model. It integrated the feature extraction pipeline into the convolution and pooling layer and used LSTM layer to capture the state transitions. The discussed network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning.

  8. Apprenticeship of Immersion: College Access for High School Students Interested in Teaching Mathematics or Science

    ERIC Educational Resources Information Center

    Harkness, Shelly Sheats; Johnson, Iris DeLoach; Hensley, Billy; Stallworth, James A.

    2011-01-01

    Issues related to college access and the need for a pipeline of STEM teachers, provided the impetus for the Ohio Board of Regents (OBR) to issue a call for Ohio universities to design pre-college experiences for high school students with three major goals in mind: (a) improvement in mathematics, science, or foreign language learning; (b) increased…

  9. Answering the Call for Equitable Access to Effective Teachers: Lessons Learned from State-Based Teacher Preparation Efforts in Georgia, Indiana, Michigan, New Jersey, and Ohio

    ERIC Educational Resources Information Center

    Woodrow Wilson National Fellowship Foundation, 2015

    2015-01-01

    The nation's teacher education programs are not producing the quantity or quality of teachers needed, particularly in needed subjects. The only way to ensure a strong enough pipeline of effective teachers to ensure equitable access is to dramatically increase how states are preparing prospective educators. The Woodrow Wilson National Fellowship…

  10. Characterization and Expression of Drug Resistance Genes in MDROs Originating from Combat Wound Infections

    DTIC Science & Technology

    2016-09-01

    assigned a classification. MLST analysis MLST was determined using an in-house automated pipeline that first searches for homologs of each gene of...and virulence mechanism contributing to their success as pathogens in the wound environment. A novel bioinformatics pipeline was used to incorporate...monitored in two ways: read-based genome QC and assembly based metrics. The JCVI Genome QC pipeline samples sequence reads and performs BLAST

  11. A Spatial Risk Analysis of Oil Refineries within the United States

    DTIC Science & Technology

    2012-03-01

    regulator and consumer. This is especially true within the energy sector which is composed of electrical power, oil , and gas infrastructure [10...Naphtali, "Analysis of Electrical Power and Oil and Gas Pipeline Failures," in International Federation for Information Processing, E. Goetz and S...61-67, September 1999. [5] J. Simonoff, C. Restrepo, R. Zimmerman, and Z. Naphtali, "Analysis of Electrical Power and Oil and Gas Pipeline Failures

  12. Research on numerical simulation and protection of transient process in long-distance slurry transportation pipelines

    NASA Astrophysics Data System (ADS)

    Lan, G.; Jiang, J.; Li, D. D.; Yi, W. S.; Zhao, Z.; Nie, L. N.

    2013-12-01

    The calculation of water-hammer pressure phenomenon of single-phase liquid is already more mature for a pipeline of uniform characteristics, but less research has addressed the calculation of slurry water hammer pressure in complex pipelines with slurry flows carrying solid particles. In this paper, based on the developments of slurry pipelines at home and abroad, the fundamental principle and method of numerical simulation of transient processes are presented, and several boundary conditions are given. Through the numerical simulation and analysis of transient processes of a practical engineering of long-distance slurry transportation pipeline system, effective protection measures and operating suggestions are presented. A model for calculating the water impact of solid and fluid phases is established based on a practical engineering of long-distance slurry pipeline transportation system. After performing a numerical simulation of the transient process, analyzing and comparing the results, effective protection measures and operating advice are recommended, which has guiding significance to the design and operating management of practical engineering of longdistance slurry pipeline transportation system.

  13. Risk Analysis using Corrosion Rate Parameter on Gas Transmission Pipeline

    NASA Astrophysics Data System (ADS)

    Sasikirono, B.; Kim, S. J.; Haryadi, G. D.; Huda, A.

    2017-05-01

    In the oil and gas industry, the pipeline is a major component in the transmission and distribution process of oil and gas. Oil and gas distribution process sometimes performed past the pipeline across the various types of environmental conditions. Therefore, in the transmission and distribution process of oil and gas, a pipeline should operate safely so that it does not harm the surrounding environment. Corrosion is still a major cause of failure in some components of the equipment in a production facility. In pipeline systems, corrosion can cause failures in the wall and damage to the pipeline. Therefore it takes care and periodic inspections or checks on the pipeline system. Every production facility in an industry has a level of risk for damage which is a result of the opportunities and consequences of damage caused. The purpose of this research is to analyze the level of risk of 20-inch Natural Gas Transmission Pipeline using Risk-based inspection semi-quantitative based on API 581 associated with the likelihood of failure and the consequences of the failure of a component of the equipment. Then the result is used to determine the next inspection plans. Nine pipeline components were observed, such as a straight pipes inlet, connection tee, and straight pipes outlet. The risk assessment level of the nine pipeline’s components is presented in a risk matrix. The risk level of components is examined at medium risk levels. The failure mechanism that is used in this research is the mechanism of thinning. Based on the results of corrosion rate calculation, remaining pipeline components age can be obtained, so the remaining lifetime of pipeline components are known. The calculation of remaining lifetime obtained and the results vary for each component. Next step is planning the inspection of pipeline components by NDT external methods.

  14. Supply Support of Air Force 463L Equipment: An Analysis of the 463L equipment Spare Parts Pipeline

    DTIC Science & Technology

    1989-09-01

    service; and 4) the order processing system created inherent delays in the pipeline because of outdated and indirect information systems and technology. Keywords: Materials handling equipment, Theses. (AW)

  15. Early-type galaxies: Automated reduction and analysis of ROSAT PSPC data

    NASA Technical Reports Server (NTRS)

    Mackie, G.; Fabbiano, G.; Harnden, F. R., Jr.; Kim, D.-W.; Maggio, A.; Micela, G.; Sciortino, S.; Ciliegi, P.

    1996-01-01

    Preliminary results of early-type galaxies that will be part of a galaxy catalog to be derived from the complete Rosat data base are presented. The stored data were reduced and analyzed by an automatic pipeline. This pipeline is based on a command language scrip. The important features of the pipeline include new data time screening in order to maximize the signal to noise ratio of faint point-like sources, source detection via a wavelet algorithm, and the identification of sources with objects from existing catalogs. The pipeline outputs include reduced images, contour maps, surface brightness profiles, spectra, color and hardness ratios.

  16. PMAnalyzer: a new web interface for bacterial growth curve analysis.

    PubMed

    Cuevas, Daniel A; Edwards, Robert A

    2017-06-15

    Bacterial growth curves are essential representations for characterizing bacteria metabolism within a variety of media compositions. Using high-throughput, spectrophotometers capable of processing tens of 96-well plates, quantitative phenotypic information can be easily integrated into the current data structures that describe a bacterial organism. The PMAnalyzer pipeline performs a growth curve analysis to parameterize the unique features occurring within microtiter wells containing specific growth media sources. We have expanded the pipeline capabilities and provide a user-friendly, online implementation of this automated pipeline. PMAnalyzer version 2.0 provides fast automatic growth curve parameter analysis, growth identification and high resolution figures of sample-replicate growth curves and several statistical analyses. PMAnalyzer v2.0 can be found at https://edwards.sdsu.edu/pmanalyzer/ . Source code for the pipeline can be found on GitHub at https://github.com/dacuevas/PMAnalyzer . Source code for the online implementation can be found on GitHub at https://github.com/dacuevas/PMAnalyzerWeb . dcuevas08@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  17. Text-based Analytics for Biosurveillance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Charles, Lauren E.; Smith, William P.; Rounds, Jeremiah

    The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related tomore » biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when). The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when).« less

  18. [Character accentuations as a criterion for psychological risks in the professional activity of the builders of main gas pipelines in the conditions of arctic].

    PubMed

    Korneeva, Ia A; Simonova, N N

    2015-01-01

    The article is devoted to the study of character accentuations as a criterion for psychological risks in the professional activity of builders of main gas pipelines in the conditions of Arctic. to study the severity of character accentuations in rotation-employed builders of main gas pipelines, stipulated by their professional activities, as well as personal resources to overcome these destructions. The study involved 70 rotation-employed builders of trunk pipelines, working in the Tyumen Region (duration of the shift-in--52 days), aged from 23 to 59 (mean age 34,9 ± 8.1) years, with the experience of work from 0.5 years to 14 years (the average length of 4.42 ± 3.1). Methods of the study: questionnaires, psychological testing, participant observation. One-Sample t-test of Student, multiple regression analysis, incremental analysis. In the work there were revealed differences of expression of character accentuations in builders of trunk pipelines with experience in work on rotation less and more than five years. There was determined that builders of the main gas pipelines, working on the rotation in Arctic, with more pronounced accentuation ofthe character use mainly psychological defenses of compensation, substitution and denial, and have an average level of expression of flexibility as the regulatory process.

  19. Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*

    PubMed Central

    Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.

    2015-01-01

    Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363

  20. Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.

    PubMed

    Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L

    2015-02-01

    Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  1. Generation of ethylene tracer by noncatalytic pyrolysis of natural gas at elevated pressure

    USGS Publications Warehouse

    Lu, Y.; Chen, S.; Rostam-Abadi, M.; Ruch, R.; Coleman, D.; Benson, L.J.

    2005-01-01

    There is a critical need within the pipeline gas industry for an inexpensive and reliable technology to generate an identification tag or tracer that can be added to pipeline gas to identify gas that may escape and improve the deliverability and management of gas in underground storage fields. Ethylene is an ideal tracer, because it does not exist naturally in the pipeline gas, and because its physical properties are similar to the pipeline gas components. A pyrolysis process, known as the Tragen process, has been developed to continuously convert the ???2%-4% ethane component present in pipeline gas into ethylene at common pipeline pressures of 800 psi. In our studies of the Tragen process, pyrolysis without steam addition achieved a maximum ethylene yield of 28%-35% at a temperature range of 700-775 ??C, corresponding to an ethylene concentration of 4600-5800 ppm in the product gas. Coke deposition was determined to occur at a significant rate in the pyrolysis reactor without steam addition. The ?? 13C isotopic analysis of gas components showed a ?? 13C value of ethylene similar to ethane in the pipeline gas, indicating that most of the ethylene was generated from decomposition of the ethane in the raw gas. However, ?? 13C isotopic analysis of the deposited coke showed that coke was primarily produced from methane, rather than from ethane or other heavier hydrocarbons. No coke deposition was observed with the addition of steam at concentrations of > 20% by volume. The dilution with steam also improved the ethylene yield. ?? 2005 American Chemical Society.

  2. Bad Actors Criticality Assessment for Pipeline system

    NASA Astrophysics Data System (ADS)

    Nasir, Meseret; Chong, Kit wee; Osman, Sabtuni; Siaw Khur, Wee

    2015-04-01

    Failure of a pipeline system could bring huge economic loss. In order to mitigate such catastrophic loss, it is required to evaluate and rank the impact of each bad actor of the pipeline system. In this study, bad actors are known as the root causes or any potential factor leading to the system downtime. Fault Tree Analysis (FTA) is used to analyze the probability of occurrence for each bad actor. Bimbaum's Importance and criticality measure (BICM) is also employed to rank the impact of each bad actor on the pipeline system failure. The results demonstrate that internal corrosion; external corrosion and construction damage are critical and highly contribute to the pipeline system failure with 48.0%, 12.4% and 6.0% respectively. Thus, a minor improvement in internal corrosion; external corrosion and construction damage would bring significant changes in the pipeline system performance and reliability. These results could also be useful to develop efficient maintenance strategy by identifying the critical bad actors.

  3. Design and analysis of FBG based sensor for detection of damage in oil and gas pipelines for safety of marine life

    NASA Astrophysics Data System (ADS)

    Bedi, Amna; Kothari, Vaishali; Kumar, Santosh

    2018-02-01

    The under laid gas and oil pipelines on the seafloor are prone to various disturbances like seismic movements of the sea bed, oceanic currents, tsunamis. These factors tend to damage such pipelines connecting different locations of the world dependent on these pipelines for their day-to-day use of oil and natural gas. If damaged, the oil spills in the water bodies cause grave loss to marine life along with serious economic issues. It is not feasible to monitor the undersea pipelines manually because of the huge seafloor depth. For timely detection of such damage, a new technique using optical Fiber Bragg grating (FBG) sensors and its installation has been given in this work. The idea of an FBG sensor for detecting damage in pipeline structure based on the acoustic emission has been worked out. The numerical calculation has been done based on the fundamental of strain measurement and the output has been simulated using MATLAB.

  4. Automated processing pipeline for neonatal diffusion MRI in the developing Human Connectome Project.

    PubMed

    Bastiani, Matteo; Andersson, Jesper L R; Cordero-Grande, Lucilio; Murgasova, Maria; Hutter, Jana; Price, Anthony N; Makropoulos, Antonios; Fitzgibbon, Sean P; Hughes, Emer; Rueckert, Daniel; Victor, Suresh; Rutherford, Mary; Edwards, A David; Smith, Stephen M; Tournier, Jacques-Donald; Hajnal, Joseph V; Jbabdi, Saad; Sotiropoulos, Stamatios N

    2018-05-28

    The developing Human Connectome Project is set to create and make available to the scientific community a 4-dimensional map of functional and structural cerebral connectivity from 20 to 44 weeks post-menstrual age, to allow exploration of the genetic and environmental influences on brain development, and the relation between connectivity and neurocognitive function. A large set of multi-modal MRI data from fetuses and newborn infants is currently being acquired, along with genetic, clinical and developmental information. In this overview, we describe the neonatal diffusion MRI (dMRI) image processing pipeline and the structural connectivity aspect of the project. Neonatal dMRI data poses specific challenges, and standard analysis techniques used for adult data are not directly applicable. We have developed a processing pipeline that deals directly with neonatal-specific issues, such as severe motion and motion-related artefacts, small brain sizes, high brain water content and reduced anisotropy. This pipeline allows automated analysis of in-vivo dMRI data, probes tissue microstructure, reconstructs a number of major white matter tracts, and includes an automated quality control framework that identifies processing issues or inconsistencies. We here describe the pipeline and present an exemplar analysis of data from 140 infants imaged at 38-44 weeks post-menstrual age. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Jet-mixing of initially-stratified liquid-liquid pipe flows: experiments and numerical simulations

    NASA Astrophysics Data System (ADS)

    Wright, Stuart; Ibarra-Hernandes, Roberto; Xie, Zhihua; Markides, Christos; Matar, Omar

    2016-11-01

    Low pipeline velocities lead to stratification and so-called 'phase slip' in horizontal liquid-liquid flows due to differences in liquid densities and viscosities. Stratified flows have no suitable single point for sampling, from which average phase properties (e.g. fractions) can be established. Inline mixing, achieved by static mixers or jets in cross-flow (JICF), is often used to overcome liquid-liquid stratification by establishing unstable two-phase dispersions for sampling. Achieving dispersions in liquid-liquid pipeline flows using JICF is the subject of this experimental and modelling work. The experimental facility involves a matched refractive index liquid-liquid-solid system, featuring an ETFE test section, and experimental liquids which are silicone oil and a 51-wt% glycerol solution. The matching then allows the dispersed fluid phase fractions and velocity fields to be established through advanced optical techniques, namely PLIF (for phase) and PTV or PIV (for velocity fields). CFD codes using the volume of a fluid (VOF) method are then used to demonstrate JICF breakup and dispersion in stratified pipeline flows. A number of simple jet configurations are described and their dispersion effectiveness is compared with the experimental results. Funding from Cameron for Ph.D. studentship (SW) gratefully acknowledged.

  6. An architecture of entropy decoder, inverse quantiser and predictor for multi-standard video decoding

    NASA Astrophysics Data System (ADS)

    Liu, Leibo; Chen, Yingjie; Yin, Shouyi; Lei, Hao; He, Guanghui; Wei, Shaojun

    2014-07-01

    A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.

  7. Location of coating defects and assessment of level of cathodic protection on underground pipelines using AC impedance, deterministic and non-deterministic models

    NASA Astrophysics Data System (ADS)

    Castaneda-Lopez, Homero

    A methodology for detecting and locating defects or discontinuities on the outside covering of coated metal underground pipelines subjected to cathodic protection has been addressed. On the basis of wide range AC impedance signals for various frequencies applied to a steel-coated pipeline system and by measuring its corresponding transfer function under several laboratory simulation scenarios, a physical laboratory setup of an underground cathodic-protected, coated pipeline was built. This model included different variables and elements that exist under real conditions, such as soil resistivity, soil chemical composition, defect (holiday) location in the pipeline covering, defect area and geometry, and level of cathodic protection. The AC impedance data obtained under different working conditions were used to fit an electrical transmission line model. This model was then used as a tool to fit the impedance signal for different experimental conditions and to establish trends in the impedance behavior without the necessity of further experimental work. However, due to the chaotic nature of the transfer function response of this system under several conditions, it is believed that non-deterministic models based on pattern recognition algorithms are suitable for field condition analysis. A non-deterministic approach was used for experimental analysis by applying an artificial neural network (ANN) algorithm based on classification analysis capable of studying the pipeline system and differentiating the variables that can change impedance conditions. These variables include level of cathodic protection, location of discontinuities (holidays), and severity of corrosion. This work demonstrated a proof-of-concept for a well-known technique and a novel algorithm capable of classifying impedance data for experimental results to predict the exact location of the active holidays and defects on the buried pipelines. Laboratory findings from this procedure are promising, and efforts to develop it for field conditions should continue.

  8. Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data.

    PubMed

    Morris, Tiffany J; Beck, Stephan

    2015-01-15

    The Illumina HumanMethylation450 BeadChip has become a popular platform for interrogating DNA methylation in epigenome-wide association studies (EWAS) and related projects as well as resource efforts such as the International Cancer Genome Consortium (ICGC) and the International Human Epigenome Consortium (IHEC). This has resulted in an exponential increase of 450k data in recent years and triggered the development of numerous integrated analysis pipelines and stand-alone packages. This review will introduce and discuss the currently most popular pipelines and packages and is particularly aimed at new 450k users. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi

    NASA Astrophysics Data System (ADS)

    Erdmann, M.; Fischer, B.; Fischer, R.; Rieger, M.

    2017-10-01

    In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In addition to already built-in features for remote jobs and file systems like Hadoop and HDFS, we added support for WLCG infrastructure such as LSF and CREAM job submission, as well as remote file access through the Grid File Access Library. Furthermore, we implemented automated resubmission functionality, software sandboxing, and a command line interface with auto-completion for a convenient working environment. For the implementation of a t \\overline{{{t}}} H cross section measurement, we created a generic Python interface that provides programmatic access to all external information such as datasets, physics processes, statistical models, and additional files and values. In summary, the setup enables the execution of the entire analysis in a parallelized and distributed fashion with a single command.

  10. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI

    PubMed Central

    Churchill, Nathan W.; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C.

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the “pipeline”) significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard “fixed” preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets. PMID:26161667

  11. A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

    PubMed Central

    Redelings, Benjamin D.

    2017-01-01

    We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub. PMID:28265520

  12. Mineral Resource of the Month: Niobium

    USGS Publications Warehouse

    Papp, John F.

    2014-01-01

    Niobium, also called columbium, is a transition metal with a very high melting point. It is in greatest demand in industrialized countries, like the United States, because of its defense-related uses in the aerospace, energy and transportation industries. Niobium is used mostly to make high-strength, low-alloy (HSLA) steel and stainless steel. HSLA steels are used in large-diameter pipes for oil and natural gas pipelines and automobile wheels.

  13. Development of an Automated Imaging Pipeline for the Analysis of the Zebrafish Larval Kidney

    PubMed Central

    Westhoff, Jens H.; Giselbrecht, Stefan; Schmidts, Miriam; Schindler, Sebastian; Beales, Philip L.; Tönshoff, Burkhard; Liebel, Urban; Gehrig, Jochen

    2013-01-01

    The analysis of kidney malformation caused by environmental influences during nephrogenesis or by hereditary nephropathies requires animal models allowing the in vivo observation of developmental processes. The zebrafish has emerged as a useful model system for the analysis of vertebrate organ development and function, and it is suitable for the identification of organotoxic or disease-modulating compounds on a larger scale. However, to fully exploit its potential in high content screening applications, dedicated protocols are required allowing the consistent visualization of inner organs such as the embryonic kidney. To this end, we developed a high content screening compatible pipeline for the automated imaging of standardized views of the developing pronephros in zebrafish larvae. Using a custom designed tool, cavities were generated in agarose coated microtiter plates allowing for accurate positioning and orientation of zebrafish larvae. This enabled the subsequent automated acquisition of stable and consistent dorsal views of pronephric kidneys. The established pipeline was applied in a pilot screen for the analysis of the impact of potentially nephrotoxic drugs on zebrafish pronephros development in the Tg(wt1b:EGFP) transgenic line in which the developing pronephros is highlighted by GFP expression. The consistent image data that was acquired allowed for quantification of gross morphological pronephric phenotypes, revealing concentration dependent effects of several compounds on nephrogenesis. In addition, applicability of the imaging pipeline was further confirmed in a morpholino based model for cilia-associated human genetic disorders associated with different intraflagellar transport genes. The developed tools and pipeline can be used to study various aspects in zebrafish kidney research, and can be readily adapted for the analysis of other organ systems. PMID:24324758

  14. Development of an automated imaging pipeline for the analysis of the zebrafish larval kidney.

    PubMed

    Westhoff, Jens H; Giselbrecht, Stefan; Schmidts, Miriam; Schindler, Sebastian; Beales, Philip L; Tönshoff, Burkhard; Liebel, Urban; Gehrig, Jochen

    2013-01-01

    The analysis of kidney malformation caused by environmental influences during nephrogenesis or by hereditary nephropathies requires animal models allowing the in vivo observation of developmental processes. The zebrafish has emerged as a useful model system for the analysis of vertebrate organ development and function, and it is suitable for the identification of organotoxic or disease-modulating compounds on a larger scale. However, to fully exploit its potential in high content screening applications, dedicated protocols are required allowing the consistent visualization of inner organs such as the embryonic kidney. To this end, we developed a high content screening compatible pipeline for the automated imaging of standardized views of the developing pronephros in zebrafish larvae. Using a custom designed tool, cavities were generated in agarose coated microtiter plates allowing for accurate positioning and orientation of zebrafish larvae. This enabled the subsequent automated acquisition of stable and consistent dorsal views of pronephric kidneys. The established pipeline was applied in a pilot screen for the analysis of the impact of potentially nephrotoxic drugs on zebrafish pronephros development in the Tg(wt1b:EGFP) transgenic line in which the developing pronephros is highlighted by GFP expression. The consistent image data that was acquired allowed for quantification of gross morphological pronephric phenotypes, revealing concentration dependent effects of several compounds on nephrogenesis. In addition, applicability of the imaging pipeline was further confirmed in a morpholino based model for cilia-associated human genetic disorders associated with different intraflagellar transport genes. The developed tools and pipeline can be used to study various aspects in zebrafish kidney research, and can be readily adapted for the analysis of other organ systems.

  15. Variant Review with the Integrative Genomics Viewer.

    PubMed

    Robinson, James T; Thorvaldsdóttir, Helga; Wenger, Aaron M; Zehir, Ahmet; Mesirov, Jill P

    2017-11-01

    Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR . ©2017 American Association for Cancer Research.

  16. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.

    PubMed

    Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.

  17. PEA: an integrated R toolkit for plant epitranscriptome analysis.

    PubMed

    Zhai, Jingjing; Song, Jie; Cheng, Qian; Tang, Yunjia; Ma, Chuang

    2018-05-29

    The epitranscriptome, also known as chemical modifications of RNA (CMRs), is a newly discovered layer of gene regulation, the biological importance of which emerged through analysis of only a small fraction of CMRs detected by high-throughput sequencing technologies. Understanding of the epitranscriptome is hampered by the absence of computational tools for the systematic analysis of epitranscriptome sequencing data. In addition, no tools have yet been designed for accurate prediction of CMRs in plants, or to extend epitranscriptome analysis from a fraction of the transcriptome to its entirety. Here, we introduce PEA, an integrated R toolkit to facilitate the analysis of plant epitranscriptome data. The PEA toolkit contains a comprehensive collection of functions required for read mapping, CMR calling, motif scanning and discovery, and gene functional enrichment analysis. PEA also takes advantage of machine learning technologies for transcriptome-scale CMR prediction, with high prediction accuracy, using the Positive Samples Only Learning algorithm, which addresses the two-class classification problem by using only positive samples (CMRs), in the absence of negative samples (non-CMRs). Hence PEA is a versatile epitranscriptome analysis pipeline covering CMR calling, prediction, and annotation, and we describe its application to predict N6-methyladenosine (m6A) modifications in Arabidopsis thaliana. Experimental results demonstrate that the toolkit achieved 71.6% sensitivity and 73.7% specificity, which is superior to existing m6A predictors. PEA is potentially broadly applicable to the in-depth study of epitranscriptomics. PEA Docker image is available at https://hub.docker.com/r/malab/pea, source codes and user manual are available at https://github.com/cma2015/PEA. chuangma2006@gmail.com. Supplementary data are available at Bioinformatics online.

  18. Theory and Application of Magnetic Flux Leakage Pipeline Detection.

    PubMed

    Shi, Yan; Zhang, Chao; Li, Rui; Cai, Maolin; Jia, Guanwei

    2015-12-10

    Magnetic flux leakage (MFL) detection is one of the most popular methods of pipeline inspection. It is a nondestructive testing technique which uses magnetic sensitive sensors to detect the magnetic leakage field of defects on both the internal and external surfaces of pipelines. This paper introduces the main principles, measurement and processing of MFL data. As the key point of a quantitative analysis of MFL detection, the identification of the leakage magnetic signal is also discussed. In addition, the advantages and disadvantages of different identification methods are analyzed. Then the paper briefly introduces the expert systems used. At the end of this paper, future developments in pipeline MFL detection are predicted.

  19. Theory and Application of Magnetic Flux Leakage Pipeline Detection

    PubMed Central

    Shi, Yan; Zhang, Chao; Li, Rui; Cai, Maolin; Jia, Guanwei

    2015-01-01

    Magnetic flux leakage (MFL) detection is one of the most popular methods of pipeline inspection. It is a nondestructive testing technique which uses magnetic sensitive sensors to detect the magnetic leakage field of defects on both the internal and external surfaces of pipelines. This paper introduces the main principles, measurement and processing of MFL data. As the key point of a quantitative analysis of MFL detection, the identification of the leakage magnetic signal is also discussed. In addition, the advantages and disadvantages of different identification methods are analyzed. Then the paper briefly introduces the expert systems used. At the end of this paper, future developments in pipeline MFL detection are predicted. PMID:26690435

  20. ESAP plus: a web-based server for EST-SSR marker development.

    PubMed

    Ponyared, Piyarat; Ponsawat, Jiradej; Tongsima, Sissades; Seresangtakul, Pusadee; Akkasaeng, Chutipong; Tantisuwichwong, Nathpapat

    2016-12-22

    Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases. A web-based bioinformatic pipeline, called EST Analysis Pipeline Plus (ESAP Plus), was constructed for assisting researchers to develop SSR markers from a large EST collection. ESAP Plus incorporates several bioinformatic scripts and some useful standard software tools necessary for the four main procedures of EST-SSR marker development, namely 1) pre-processing, 2) clustering and assembly, 3) SSR mining and 4) SSR primer design. The proposed pipeline also provides two alternative steps for reducing EST redundancy and identifying SSR loci. Using public sugarcane ESTs, ESAP Plus automatically executed the aforementioned computational pipeline via a simple web user interface, which was implemented using standard PHP, HTML, CSS and Java scripts. With ESAP Plus, users can upload raw EST data and choose various filtering options and parameters to analyze each of the four main procedures through this web interface. All input EST data and their predicted SSR results will be stored in the ESAP Plus MySQL database. Users will be notified via e-mail when the automatic process is completed and they can download all the results through the web interface. ESAP Plus is a comprehensive and convenient web-based bioinformatic tool for SSR marker development. ESAP Plus offers all necessary EST-SSR development processes with various adjustable options that users can easily use to identify SSR markers from a large EST collection. With familiar web interface, users can upload the raw EST using the data submission page and visualize/download the corresponding EST-SSR information from within ESAP Plus. ESAP Plus can handle considerably large EST datasets. This EST-SSR discovery tool can be accessed directly from: http://gbp.kku.ac.th/esap_plus/ .

  1. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    PubMed

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  2. Gap-free segmentation of vascular networks with automatic image processing pipeline.

    PubMed

    Hsu, Chih-Yang; Ghaffari, Mahsa; Alaraj, Ali; Flannery, Michael; Zhou, Xiaohong Joe; Linninger, Andreas

    2017-03-01

    Current image processing techniques capture large vessels reliably but often fail to preserve connectivity in bifurcations and small vessels. Imaging artifacts and noise can create gaps and discontinuity of intensity that hinders segmentation of vascular trees. However, topological analysis of vascular trees require proper connectivity without gaps, loops or dangling segments. Proper tree connectivity is also important for high quality rendering of surface meshes for scientific visualization or 3D printing. We present a fully automated vessel enhancement pipeline with automated parameter settings for vessel enhancement of tree-like structures from customary imaging sources, including 3D rotational angiography, magnetic resonance angiography, magnetic resonance venography, and computed tomography angiography. The output of the filter pipeline is a vessel-enhanced image which is ideal for generating anatomical consistent network representations of the cerebral angioarchitecture for further topological or statistical analysis. The filter pipeline combined with computational modeling can potentially improve computer-aided diagnosis of cerebrovascular diseases by delivering biometrics and anatomy of the vasculature. It may serve as the first step in fully automatic epidemiological analysis of large clinical datasets. The automatic analysis would enable rigorous statistical comparison of biometrics in subject-specific vascular trees. The robust and accurate image segmentation using a validated filter pipeline would also eliminate operator dependency that has been observed in manual segmentation. Moreover, manual segmentation is time prohibitive given that vascular trees have more than thousands of segments and bifurcations so that interactive segmentation consumes excessive human resources. Subject-specific trees are a first step toward patient-specific hemodynamic simulations for assessing treatment outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. High-Precision Phenotyping of Grape Bunch Architecture Using Fast 3D Sensor and Automation.

    PubMed

    Rist, Florian; Herzog, Katja; Mack, Jenny; Richter, Robert; Steinhage, Volker; Töpfer, Reinhard

    2018-03-02

    Wine growers prefer cultivars with looser bunch architecture because of the decreased risk for bunch rot. As a consequence, grapevine breeders have to select seedlings and new cultivars with regard to appropriate bunch traits. Bunch architecture is a mosaic of different single traits which makes phenotyping labor-intensive and time-consuming. In the present study, a fast and high-precision phenotyping pipeline was developed. The optical sensor Artec Spider 3D scanner (Artec 3D, L-1466, Luxembourg) was used to generate dense 3D point clouds of grapevine bunches under lab conditions and an automated analysis software called 3D-Bunch-Tool was developed to extract different single 3D bunch traits, i.e., the number of berries, berry diameter, single berry volume, total volume of berries, convex hull volume of grapes, bunch width and bunch length. The method was validated on whole bunches of different grapevine cultivars and phenotypic variable breeding material. Reliable phenotypic data were obtained which show high significant correlations (up to r² = 0.95 for berry number) compared to ground truth data. Moreover, it was shown that the Artec Spider can be used directly in the field where achieved data show comparable precision with regard to the lab application. This non-invasive and non-contact field application facilitates the first high-precision phenotyping pipeline based on 3D bunch traits in large plant sets.

  4. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).

    PubMed

    Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S

    2016-10-01

    Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

  5. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.

    PubMed

    Wolff, Alexander; Bayerlová, Michaela; Gaedcke, Jochen; Kube, Dieter; Beißbarth, Tim

    2018-01-01

    Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.

  6. obitools: a unix-inspired software package for DNA metabarcoding.

    PubMed

    Boyer, Frédéric; Mercier, Céline; Bonin, Aurélie; Le Bras, Yvan; Taberlet, Pierre; Coissac, Eric

    2016-01-01

    DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org. © 2015 John Wiley & Sons Ltd.

  7. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

    PubMed Central

    2013-01-01

    We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. PMID:23320958

  8. Gene expression profiling of human breast tissue samples using SAGE-Seq.

    PubMed

    Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley

    2010-12-01

    We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.

  9. Statistical Inference for Porous Materials using Persistent Homology.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moon, Chul; Heath, Jason E.; Mitchell, Scott A.

    2017-12-01

    We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less

  10. 30 CFR 250.1016 - Granting pipeline rights-of-way.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Regional Supervisor shall consider the potential effect of the associated pipeline on the human, marine... area during construction and operational phases. The Regional Supervisor shall prepare an environmental analysis in accordance with applicable policies and guidelines. To aid in the evaluation and determinations...

  11. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

    PubMed

    Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

    2017-12-05

    Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.

  12. Quantitative Risk Mapping of Urban Gas Pipeline Networks Using GIS

    NASA Astrophysics Data System (ADS)

    Azari, P.; Karimi, M.

    2017-09-01

    Natural gas is considered an important source of energy in the world. By increasing growth of urbanization, urban gas pipelines which transmit natural gas from transmission pipelines to consumers, will become a dense network. The increase in the density of urban pipelines will influence probability of occurring bad accidents in urban areas. These accidents have a catastrophic effect on people and their property. Within the next few years, risk mapping will become an important component in urban planning and management of large cities in order to decrease the probability of accident and to control them. Therefore, it is important to assess risk values and determine their location on urban map using an appropriate method. In the history of risk analysis of urban natural gas pipeline networks, the pipelines has always been considered one by one and their density in urban area has not been considered. The aim of this study is to determine the effect of several pipelines on the risk value of a specific grid point. This paper outlines a quantitative risk assessment method for analysing the risk of urban natural gas pipeline networks. It consists of two main parts: failure rate calculation where the EGIG historical data are used and fatal length calculation that involves calculation of gas release and fatality rate of consequences. We consider jet fire, fireball and explosion for investigating the consequences of gas pipeline failure. The outcome of this method is an individual risk and is shown as a risk map.

  13. Basic overview towards the assessment of landslide and subsidence risks along a geothermal pipeline network

    NASA Astrophysics Data System (ADS)

    Astisiasari; Van Westen, Cees; Jetten, Victor; van der Meer, Freek; Rahmawati Hizbaron, Dyah

    2017-12-01

    An operating geothermal power plant consists of installation units that work systematically in a network. The pipeline network connects various engineering structures, e.g. well pads, separator, scrubber, and power station, in the process of transferring geothermal fluids to generate electricity. Besides, a pipeline infrastructure also delivers the brine back to earth, through the injection well-pads. Despite of its important functions, a geothermal pipeline may bear a threat to its vicinity through a pipeline failure. The pipeline can be impacted by perilous events like landslides, earthquakes, and subsidence. The pipeline failure itself may relate to physical deterioration over time, e.g. due to corrosion and fatigue. The geothermal reservoirs are usually located in mountainous areas that are associated with steep slopes, complex geology, and weathered soil. Geothermal areas record a noteworthy number of disasters, especially due to landslide and subsidence. Therefore, a proper multi-risk assessment along the geothermal pipeline is required, particularly for these two types of hazard. This is also to mention that the impact on human fatality and injury is not presently discussed here. This paper aims to give a basic overview on the existing approaches for the assessment of multi-risk assessment along geothermal pipelines. It delivers basic principles on the analysis of risks and its contributing variables, in order to model the loss consequences. By considering the loss consequences, as well as the alternatives for mitigation measures, the environmental safety in geothermal working area could be enforced.

  14. A computational genomics pipeline for prokaryotic sequencing projects.

    PubMed

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  15. Numerical Investigation of the Thermal Regime of Underground Channel Heat Pipelines Under Flooding Conditions with the Use of a Conductive-Convective Heat Transfer Model

    NASA Astrophysics Data System (ADS)

    Polovnikov, V. Yu.

    2018-05-01

    This paper presents the results of numerical analysis of thermal regimes and heat losses of underground channel heating systems under flooding conditions with the use of a convective-conductive heat transfer model with the example of the configuration of the heat pipeline widely used in the Russian Federation — a nonpassage ferroconcrete channel (crawlway) and pipelines insulated with mineral wool and a protective covering layer. It has been shown that convective motion of water in the channel cavity of the heat pipeline under flooding conditions has no marked effect on the intensification of heat losses. It has been established that for the case under consideration, heat losses of the heat pipeline under flooding conditions increase from 0.75 to 52.39% due to the sharp increase in the effective thermal characteristics of the covering layer and the heat insulator caused by their moistening.

  16. Numerical Investigation of the Thermal Regime of Underground Channel Heat Pipelines Under Flooding Conditions with the Use of a Conductive-Convective Heat Transfer Model

    NASA Astrophysics Data System (ADS)

    Polovnikov, V. Yu.

    2018-03-01

    This paper presents the results of numerical analysis of thermal regimes and heat losses of underground channel heating systems under flooding conditions with the use of a convective-conductive heat transfer model with the example of the configuration of the heat pipeline widely used in the Russian Federation — a nonpassage ferroconcrete channel (crawlway) and pipelines insulated with mineral wool and a protective covering layer. It has been shown that convective motion of water in the channel cavity of the heat pipeline under flooding conditions has no marked effect on the intensification of heat losses. It has been established that for the case under consideration, heat losses of the heat pipeline under flooding conditions increase from 0.75 to 52.39% due to the sharp increase in the effective thermal characteristics of the covering layer and the heat insulator caused by their moistening.

  17. SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories

    NASA Astrophysics Data System (ADS)

    Zhang, M.; Collioud, A.; Charlot, P.

    2018-02-01

    We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.

  18. Finite Element Analysis and Experimental Study on Elbow Vibration Transmission Characteristics

    NASA Astrophysics Data System (ADS)

    Qing-shan, Dai; Zhen-hai, Zhang; Shi-jian, Zhu

    2017-11-01

    Pipeline system vibration is one of the significant factors leading to the vibration and noise of vessel. Elbow is widely used in the pipeline system. However, the researches about vibration of elbow are little, and there is no systematic study. In this research, we firstly analysed the relationship between elbow vibration transmission characteristics and bending radius by ABAQUS finite element simulation. Then, we conducted the further vibration test to observe the vibration transmission characteristics of different elbows which have the same diameter and different bending radius under different flow velocity. The results of simulation calculation and experiment both showed that the vibration acceleration levels of the pipeline system decreased with the increase of bending radius of the elbow, which was beneficial to reduce the transmission of vibration in the pipeline system. The results could be used as reference for further studies and designs for the low noise installation of pipeline system.

  19. Chemical laser exhaust pipe design research

    NASA Astrophysics Data System (ADS)

    Sun, Yunqiang; Huang, Zhilong; Chen, Zhiqiang; Ren, Zebin; Guo, Longde

    2016-10-01

    In order to weaken the chemical laser exhaust gas influence of the optical transmission, a vent pipe is advised to emissions gas to the outside of the optical transmission area. Based on a variety of exhaust pipe design, a flow field characteristic of the pipe is carried out by numerical simulation and analysis in detail. The research results show that for uniform deflating exhaust pipe, although the pipeline structure is cyclical and convenient for engineering implementation, but there is a phenomenon of air reflows at the pipeline entrance slit which can be deduced from the numerical simulation results. So, this type of pipeline structure does not guarantee seal. For the design scheme of putting the pipeline contract part at the end of the exhaust pipe, or using the method of local area or tail contraction, numerical simulation results show that backflow phenomenon still exists at the pipeline entrance slit. Preliminary analysis indicates that the contraction of pipe would result in higher static pressure near the wall for the low speed flow field, so as to produce counter pressure gradient at the entrance slit. In order to eliminate backflow phenomenon at the pipe entrance slit, concerned with the pipeline type of radial size increase gradually along the flow, flow field property in the pipe is analyzed in detail by numerical simulation methods. Numerical simulation results indicate that there is not reflow phenomenon at entrance slit of the dilated duct. However the cold air inhaled in the slit which makes the temperature of the channel wall is lower than the center temperature. Therefore, this kind of pipeline structure can not only prevent the leak of the gas, but also reduce the wall temperature. In addition, compared with the straight pipe connection way, dilated pipe structure also has periodic structure, which can facilitate system integration installation.

  20. Spectral analysis of pipe-to-soil potentials with variations of the Earth's magnetic field in the Australian region

    NASA Astrophysics Data System (ADS)

    Marshall, R. A.; Waters, C. L.; Sciffer, M. D.

    2010-05-01

    Long, steel pipelines used to transport essential resources such as gas and oil are potentially vulnerable to space weather. In order to inhibit corrosion, the pipelines are usually coated in an insulating material and maintained at a negative electric potential with respect to Earth using cathodic protection units. During periods of enhanced geomagnetic activity, potential differences between the pipeline and surrounding soil (referred to as pipe-to-soil potentials (PSPs)) may exhibit large voltage swings which place the pipeline outside the recommended "safe range" and at an increased risk of corrosion. The PSP variations result from the "geoelectric" field at the Earth's surface and associated geomagnetic field variations. Previous research investigating the relationship between the surface geoelectric field and geomagnetic source fields has focused on the high-latitude regions where line currents in the ionosphere E region are often the assumed source of the geomagnetic field variations. For the Australian region Sq currents also contribute to the geomagnetic field variations and provide the major contribution during geomagnetic quiet times. This paper presents the results of a spectral analysis of PSP measurements from four pipeline networks from the Australian region with geomagnetic field variations from nearby magnetometers. The pipeline networks extend from Queensland in the north of Australia to Tasmania in the south and provide PSP variations during both active and quiet geomagnetic conditions. The spectral analyses show both consistent phase and amplitude relationships across all pipelines, even for large separations between magnetometer and PSP sites and for small-amplitude signals. Comparison between the observational relationships and model predictions suggests a method for deriving a geoelectric field proxy suitable for indicating PSP-related space weather conditions.

  1. Analysis of oil-pipeline distribution of multiple products subject to delivery time-windows

    NASA Astrophysics Data System (ADS)

    Jittamai, Phongchai

    This dissertation defines the operational problems of, and develops solution methodologies for, a distribution of multiple products into oil pipeline subject to delivery time-windows constraints. A multiple-product oil pipeline is a pipeline system composing of pipes, pumps, valves and storage facilities used to transport different types of liquids. Typically, products delivered by pipelines are petroleum of different grades moving either from production facilities to refineries or from refineries to distributors. Time-windows, which are generally used in logistics and scheduling areas, are incorporated in this study. The distribution of multiple products into oil pipeline subject to delivery time-windows is modeled as multicommodity network flow structure and mathematically formulated. The main focus of this dissertation is the investigation of operating issues and problem complexity of single-source pipeline problems and also providing solution methodology to compute input schedule that yields minimum total time violation from due delivery time-windows. The problem is proved to be NP-complete. The heuristic approach, a reversed-flow algorithm, is developed based on pipeline flow reversibility to compute input schedule for the pipeline problem. This algorithm is implemented in no longer than O(T·E) time. This dissertation also extends the study to examine some operating attributes and problem complexity of multiple-source pipelines. The multiple-source pipeline problem is also NP-complete. A heuristic algorithm modified from the one used in single-source pipeline problems is introduced. This algorithm can also be implemented in no longer than O(T·E) time. Computational results are presented for both methodologies on randomly generated problem sets. The computational experience indicates that reversed-flow algorithms provide good solutions in comparison with the optimal solutions. Only 25% of the problems tested were more than 30% greater than optimal values and approximately 40% of the tested problems were solved optimally by the algorithms.

  2. Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis.

    PubMed

    Yi, Ming; Mudunuri, Uma; Che, Anney; Stephens, Robert M

    2009-06-29

    One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself. We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets. This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at http://www.abcc.ncifcrf.gov/wps/wps_index.php.

  3. Phylogenetic analysis of a biofilm bacterial population in a water pipeline in the Gulf of Mexico.

    PubMed

    López, Miguel A; Zavala-Díaz de la Serna, F Javier; Jan-Roblero, Janet; Romero, Juan M; Hernández-Rodríguez, César

    2006-10-01

    The aim of this study was to assess the bacterial diversity associated with a corrosive biofilm in a steel pipeline from the Gulf of Mexico used to inject marine water into the oil reservoir. Several aerobic and heterotrophic bacteria were isolated and identified by 16S rRNA gene sequence analysis. Metagenomic DNA was also extracted to perform a denaturing gradient gel electrophoresis analysis of ribosomal genes and to construct a 16S rRNA gene metagenomic library. Denaturing gradient gel electrophoresis profiles and ribosomal libraries exhibited a limited bacterial diversity. Most of the species detected in the ribosomal library or isolated from the pipeline were assigned to Proteobacteria (Halomonas spp., Idiomarina spp., Marinobacter aquaeolei, Thalassospira sp., Silicibacter sp. and Chromohalobacter sp.) and Bacilli (Bacillus spp. and Exiguobacterium spp.). This is the first report that associates some of these bacteria with a corrosive biofilm. It is relevant that no sulfate-reducing bacteria were isolated or detected by a PCR-based method. The diversity and relative abundance of bacteria from water pipeline biofilms may contribute to an understanding of the complexity and mechanisms of metal corrosion during marine water injection in oil secondary recovery.

  4. 78 FR 27169 - Regulatory Flexibility Act Review

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-05-09

    ... DEPARTMENT OF TRANSPORTATION Pipeline and Hazardous Materials Safety Administration 49 CFR Chapter... parts 174, 177, 191, and 192... 2013 2014 Transportation of Natural and Other Gas by Pipeline; Annual... review of some of 49 CFR parts 106, 107, 171. The full analysis document for the hazardous materials...

  5. NGSPanPipe: A Pipeline for Pan-genome Identification in Microbial Strains from Experimental Reads.

    PubMed

    Kulsum, Umay; Kapil, Arti; Singh, Harpreet; Kaur, Punit

    2018-01-01

    Recent advancements in sequencing technologies have decreased both time span and cost for sequencing the whole bacterial genome. High-throughput Next-Generation Sequencing (NGS) technology has led to the generation of enormous data concerning microbial populations publically available across various repositories. As a consequence, it has become possible to study and compare the genomes of different bacterial strains within a species or genus in terms of evolution, ecology and diversity. Studying the pan-genome provides insights into deciphering microevolution, global composition and diversity in virulence and pathogenesis of a species. It can also assist in identifying drug targets and proposing vaccine candidates. The effective analysis of these large genome datasets necessitates the development of robust tools. Current methods to develop pan-genome do not support direct input of raw reads from the sequencer machine but require preprocessing of reads as an assembled protein/gene sequence file or the binary matrix of orthologous genes/proteins. We have designed an easy-to-use integrated pipeline, NGSPanPipe, which can directly identify the pan-genome from short reads. The output from the pipeline is compatible with other pan-genome analysis tools. We evaluated our pipeline with other methods for developing pan-genome, i.e. reference-based assembly and de novo assembly using simulated reads of Mycobacterium tuberculosis. The single script pipeline (pipeline.pl) is applicable for all bacterial strains. It integrates multiple in-house Perl scripts and is freely accessible from https://github.com/Biomedinformatics/NGSPanPipe .

  6. Quarry blasts assessment and their environmental impacts on the nearby oil pipelines, southeast of Helwan City, Egypt

    NASA Astrophysics Data System (ADS)

    Mohamed, Adel M. E.; Mohamed, Abuo El-Ela A.

    2013-06-01

    Ground vibrations induced by blasting in the cement quarries are one of the fundamental problems in the quarrying industry and may cause severe damage to the nearby utilities and pipelines. Therefore, a vibration control study plays an important role in the minimization of environmental effects of blasting in quarries. The current paper presents the influence of the quarry blasts at the National Cement Company (NCC) on the two oil pipelines of SUMED Company southeast of Helwan City, by measuring the ground vibrations in terms of Peak Particle Velocity (PPV). The seismic refraction for compressional waves deduced from the shallow seismic survey and the shear wave velocity obtained from the Multi channel Analysis of Surface Waves (MASW) technique are used to evaluate the closest site of the two pipelines to the quarry blasts. The results demonstrate that, the closest site of the two pipelines is of class B, according to the National Earthquake Hazard Reduction Program (NEHRP) classification and the safe distance to avoid any environmental effects is 650 m, following the deduced Peak Particle Velocity (PPV) and scaled distance (SD) relationship (PPV = 700.08 × SD-1.225) in mm/s and the Air over Pressure (air blast) formula (air blast = 170.23 × SD-0.071) in dB. In the light of prediction analysis, the maximum allowable charge weight per delay was found to be 591 kg with damage criterion of 12.5 mm/s at the closest site of the SUMED pipelines.

  7. The Leadership Efficacy of Graduates of North Carolina School of Science and Mathematics: A Mixed-Methods Analysis

    NASA Astrophysics Data System (ADS)

    Mason, Letita Renee

    This study examines the leadership efficacy amongst graduates of NCSSM from the classes of 2000--07 as the unit of analysis. How do NCSSM graduates' perceptions of their leadership efficacy align with research on non-cognitive variables as indicators of academic performance using the unit of analysis as a performance outcome? This study is based on the theoretical construct that non-cognitive psychological (also called motivational) factors are core components of leadership self-efficacy, indicative of NCSSM graduates (who had high academic performance and attained STEM degrees). It holds promise for increasing both student interest and diversity in the race to strengthen the STEM pipeline. In this study the Hannah and Avolio (2013) Mind Garden Leadership Efficacy Questionnaire (LEQ) is used. The LEQ is a battery of three instruments designed to assess individual perceptions of personal leadership efficacy across three constructs, via one survey tool. In this mixed-methods analysis, a quantitative phase was conducted to collect the data captured by the Mind Garden Leadership Efficacy Questionnaire. A Post Hoc qualitative analysis was conducted in the second phase of the data analysis, using the Trichotomous-Square Test methodology (with an associated qualitative researcher-designed Inventive Investigative Instrument). The results from the study validated the alternative hypothesis [H1], which proposed that there no are significant differences in the perception of the Leadership Efficacy by the North Carolina School of Science and Mathematics Alumni from the classes of 2000-07 in terms of their overall "Leadership Efficacy" in regards to: Execution or "Leadership Action Efficacy"; Capacity or "Leader Means Efficacy"; and Environment or "Leader Self-Regulation Efficacy" was accepted. The results also led to the development of a new assessment tool called the Mason Leadership Efficacy Model.

  8. FROGS: Find, Rapidly, OTUs with Galaxy Solution.

    PubMed

    Escudié, Frédéric; Auer, Lucas; Bernard, Maria; Mariadassou, Mahendra; Cauquil, Laurent; Vidal, Katia; Maman, Sarah; Hernandez-Raquet, Guillermina; Combes, Sylvie; Pascal, Géraldine

    2018-04-15

    Metagenomics leads to major advances in microbial ecology and biologists need user friendly tools to analyze their data on their own. This Galaxy-supported pipeline, called FROGS, is designed to analyze large sets of amplicon sequences and produce abundance tables of Operational Taxonomic Units (OTUs) and their taxonomic affiliation. The clustering uses Swarm. The chimera removal uses VSEARCH, combined with original cross-sample validation. The taxonomic affiliation returns an innovative multi-affiliation output to highlight databases conflicts and uncertainties. Statistical results and numerous graphical illustrations are produced along the way to monitor the pipeline. FROGS was tested for the detection and quantification of OTUs on real and in silico datasets and proved to be rapid, robust and highly sensitive. It compares favorably with the widespread mothur, UPARSE and QIIME. Source code and instructions for installation: https://github.com/geraldinepascal/FROGS.git. A companion website: http://frogs.toulouse.inra.fr. geraldine.pascal@inra.fr. Supplementary data are available at Bioinformatics online.

  9. What Works for Women in Undergraduate Physics and What We Can Learn from Women's Colleges

    NASA Astrophysics Data System (ADS)

    Whitten, Barbara L.; Dorato, Shannon R.; Duncombe, Margaret L.; Allen, Patricia E.; Blaha, Cynthia A.; Butler, Heather Z.; Shaw, Kimberly A.; Taylor, Beverley A. P.; Williams, Barbara A.

    We are studying the recruitment and retention of women in undergraduate physics by conducting site visits to physics departments. In this second phase of the project, we visited six physics departments in women's colleges. We compared these departments to each other and to the nine departments in coeducational schools that we visited in phase 1 of the project (Whitten, Foster, & Duncombe, 2003a; Whitten et al., 2003b; Whitten et al., 2004). We learned that women's colleges, much more than coed schools, try to recruit students into the physics major. This has led us to criticize the "leaky pipeline" metaphor often used to describe women in physics and to call attention to women dropping in to the physics pipeline. We discuss our results for students and pedagogy and for faculty and institutions, and we offer some advice on how to make a physics department more female friendly.

  10. SUPRA: open-source software-defined ultrasound processing for real-time applications : A 2D and 3D pipeline from beamforming to B-mode.

    PubMed

    Göbl, Rüdiger; Navab, Nassir; Hennersperger, Christoph

    2018-06-01

    Research in ultrasound imaging is limited in reproducibility by two factors: First, many existing ultrasound pipelines are protected by intellectual property, rendering exchange of code difficult. Second, most pipelines are implemented in special hardware, resulting in limited flexibility of implemented processing steps on such platforms. With SUPRA, we propose an open-source pipeline for fully software-defined ultrasound processing for real-time applications to alleviate these problems. Covering all steps from beamforming to output of B-mode images, SUPRA can help improve the reproducibility of results and make modifications to the image acquisition mode accessible to the research community. We evaluate the pipeline qualitatively, quantitatively, and regarding its run time. The pipeline shows image quality comparable to a clinical system and backed by point spread function measurements a comparable resolution. Including all processing stages of a usual ultrasound pipeline, the run-time analysis shows that it can be executed in 2D and 3D on consumer GPUs in real time. Our software ultrasound pipeline opens up the research in image acquisition. Given access to ultrasound data from early stages (raw channel data, radiofrequency data), it simplifies the development in imaging. Furthermore, it tackles the reproducibility of research results, as code can be shared easily and even be executed without dedicated ultrasound hardware.

  11. Corral framework: Trustworthy and fully functional data intensive parallel astronomical pipelines

    NASA Astrophysics Data System (ADS)

    Cabral, J. B.; Sánchez, B.; Beroiz, M.; Domínguez, M.; Lares, M.; Gurovich, S.; Granitto, P.

    2017-07-01

    Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in astronomysince the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.

  12. Optimal Energy Consumption Analysis of Natural Gas Pipeline

    PubMed Central

    Liu, Enbin; Li, Changjun; Yang, Yi

    2014-01-01

    There are many compressor stations along long-distance natural gas pipelines. Natural gas can be transported using different boot programs and import pressures, combined with temperature control parameters. Moreover, different transport methods have correspondingly different energy consumptions. At present, the operating parameters of many pipelines are determined empirically by dispatchers, resulting in high energy consumption. This practice does not abide by energy reduction policies. Therefore, based on a full understanding of the actual needs of pipeline companies, we introduce production unit consumption indicators to establish an objective function for achieving the goal of lowering energy consumption. By using a dynamic programming method for solving the model and preparing calculation software, we can ensure that the solution process is quick and efficient. Using established optimization methods, we analyzed the energy savings for the XQ gas pipeline. By optimizing the boot program, the import station pressure, and the temperature parameters, we achieved the optimal energy consumption. By comparison with the measured energy consumption, the pipeline now has the potential to reduce energy consumption by 11 to 16 percent. PMID:24955410

  13. Push Force Analysis of Anchor Block of the Oil and Gas Pipeline in a Single-Slope Tunnel Based on the Energy Balance Method

    PubMed Central

    Yan, Yifei; Zhang, Lisong; Yan, Xiangzhen

    2016-01-01

    In this paper, a single-slope tunnel pipeline was analysed considering the effects of vertical earth pressure, horizontal soil pressure, inner pressure, thermal expansion force and pipeline—soil friction. The concept of stagnation point for the pipeline was proposed. Considering the deformation compatibility condition of the pipeline elbow, the push force of anchor blocks of a single-slope tunnel pipeline was derived based on an energy method. Then, the theoretical formula for this force is thus generated. Using the analytical equation, the push force of the anchor block of an X80 large-diameter pipeline from the West—East Gas Transmission Project was determined. Meanwhile, to verify the results of the analytical method, and the finite element method, four categories of finite element codes were introduced to calculate the push force, including CAESARII, ANSYS, AutoPIPE and ALGOR. The results show that the analytical results agree well with the numerical results, and the maximum relative error is only 4.1%. Therefore, the results obtained with the analytical method can satisfy engineering requirements. PMID:26963097

  14. Still Endangered: Perspectives of Black Male Teachers Answering the Call to Teach in NC Public Schools and a Glance at College and LEA Recruitment Strategies

    ERIC Educational Resources Information Center

    Moore, Shekina Michelle

    2016-01-01

    Based on the school to prison pipeline that has garnered a great amount of attention in the past decade, many studies have underscored the need for Black male teacher presence in schools. However, not much beyond rhetoric has taken place to change educational policy or practices. While the student body in American K-12 education has become…

  15. CoVaCS: a consensus variant calling system.

    PubMed

    Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

    2018-02-05

    The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .

  16. Development of a design methodology for pipelines in ice scoured seabeds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, J.I.; Paulin, M.J.; Lach, P.R.

    1994-12-31

    Large areas of the continental shelf of northern oceans are frequently scoured or gouged by moving bodies of ice such as icebergs and sea ice keels associated with pressure ridges. This phenomenon presents a formidable challenge when the route of a submarine pipeline is intersected by the scouring ice. It is generally acknowledged that if a pipeline, laid on the seabed, were hit by an iceberg or a pressure ridge keel, the forces imposed on the pipeline would be much greater than it could practically withstand. The pipeline must therefore be buried to avoid direct contact with ice, but itmore » is very important to determine with some assurance the minimum depth required for safety for both economical and environmental reasons. The safe burial depth of a pipeline, however, cannot be determined directly from the relatively straight forward measurement of maximum scour depth. The major design consideration is the determination of the potential sub-scour deformation of the ice scoured soil. Forces transmitted through the soil and soil displacement around the pipeline could load the pipeline to failure if not taken into account in the design. If the designer can predict the forces transmitted through the soil, the pipeline can be designed to withstand these external forces using conventional design practice. In this paper, the authors outline a design methodology that is based on phenomenological studies of ice scoured terrain, both modern and relict, laboratory tests, centrifuge modeling, and numerical analysis. The implications of these studies, which could assist in the safe and economical design of pipelines in ice scoured terrain, will also be discussed.« less

  17. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.

    PubMed

    Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric

    2014-01-29

    Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

  18. Significantly reducing the processing times of high-speed photometry data sets using a distributed computing model

    NASA Astrophysics Data System (ADS)

    Doyle, Paul; Mtenzi, Fred; Smith, Niall; Collins, Adrian; O'Shea, Brendan

    2012-09-01

    The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.

  19. Remote control spill reduction technology : a survey and analysis of applications for liquid pipeline systems

    DOT National Transportation Integrated Search

    1995-01-01

    Given the 1988 directive, the OPS conducted a study on the potential for EFRDs : to minimize the volume of pipeline spills. They concluded that Remote Controlled Valves : (RCVs) and check valves are the only EFRDs that are effective on hazardous liqu...

  20. Benefits of utilizing CellProfiler as a characterization tool for U–10Mo nuclear fuel

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Collette, R.; Douglas, J.; Patterson, L.

    2015-07-15

    Automated image processing techniques have the potential to aid in the performance evaluation of nuclear fuels by eliminating judgment calls that may vary from person-to-person or sample-to-sample. Analysis of in-core fuel performance is required for design and safety evaluations related to almost every aspect of the nuclear fuel cycle. This study presents a methodology for assessing the quality of uranium–molybdenum fuel images and describes image analysis routines designed for the characterization of several important microstructural properties. The analyses are performed in CellProfiler, an open-source program designed to enable biologists without training in computer vision or programming to automatically extract cellularmore » measurements from large image sets. The quality metric scores an image based on three parameters: the illumination gradient across the image, the overall focus of the image, and the fraction of the image that contains scratches. The metric presents the user with the ability to ‘pass’ or ‘fail’ an image based on a reproducible quality score. Passable images may then be characterized through a separate CellProfiler pipeline, which enlists a variety of common image analysis techniques. The results demonstrate the ability to reliably pass or fail images based on the illumination, focus, and scratch fraction of the image, followed by automatic extraction of morphological data with respect to fission gas voids, interaction layers, and grain boundaries. - Graphical abstract: Display Omitted - Highlights: • A technique is developed to score U–10Mo FIB-SEM image quality using CellProfiler. • The pass/fail metric is based on image illumination, focus, and area scratched. • Automated image analysis is performed in pipeline fashion to characterize images. • Fission gas void, interaction layer, and grain boundary coverage data is extracted. • Preliminary characterization results demonstrate consistency of the algorithm.« less

  1. Data reduction and calibration for LAMOST survey

    NASA Astrophysics Data System (ADS)

    Luo, Ali; Zhang, Jiannan; Chen, Jianjun; Song, Yihan; Wu, Yue; Bai, Zhongrui; Wang, Fengfei; Du, Bing; Zhang, Haotong

    2014-01-01

    There are three data pipelines for LAMOST survey. The raw data is reduced to one dimension spectra by the data reduction pipeline(2D pipeline), the extracted spectra are classified and measured by the spectral analysis pipeline(1D pipeline), while stellar parameters are measured by LASP pipeline. (a) The data reduction pipeline. The main tasks of the data reduction pipeline include bias calibration, flat field, spectra extraction, sky subtraction, wavelength calibration, exposure merging and wavelength band connection. (b) The spectra analysis pipeline. This pipeline is designed to classify and identify objects from the extracted spectra and to measure their redshift (or radial velocity). The PCAZ (Glazebrook et al. 1998) method is applied to do the classification and redshift measurement. (c) Stellar parameters LASP. Stellar parameters pipeline (LASP) is to estimate stellar atmospheric parameters, e.g. effective temperature Teff, surface gravity log g, and metallicity [Fe/H], for F, G and K type stars. To effectively determine those fundamental stellar measurements, three steps with different methods are employed. The first step utilizes the line indices to approximately define the effective temperature range of the analyzed star. Secondly, a set of the initial approximate values of the three parameters are given based on template fitting method. Finally, we exploit ULySS (Koleva et al. 2009) to give the final values of parameters through minimizing the χ 2 value between the observed spectrum and a multidimensional grid of model spectra which is generated by an interpolating of ELODIE library. There are two other classification for A type star and M type star. For A type star, standard MK system is employed (Gray et al. 2009) to give each object temperature class and luminosity type. For M type star, they are classified into subclasses by an improved Hammer method, and metallicity of each objects is also given. During the pilot survey, algorithms were improved and the pipelines were tested. The products of LAMOST survey will include extracted and calibrated spectra in FITS format, a catalog of FGK stars with stellar parameters, a catalog of M dwarf with subclass and metallicity, and a catalog of A type star with MK classification. A part of the pilot survey data, including about 319 000 high quality spectra with SNR > 10, a catalog of stellar parameters of FGK stars and another catalog of a subclass of M type stars have been released to the public in August 2012 (Luo et al. 2012). The general survey started from October 2012, and completed the first year survey. The formal data release one (DR1) is being prepared, which will include both pilot survey and first year general survey, and planed to be released under the LAMOST data policy.

  2. Automatically visualise and analyse data on pathways using PathVisioRPC from any programming environment.

    PubMed

    Bohler, Anwesha; Eijssen, Lars M T; van Iersel, Martijn P; Leemans, Christ; Willighagen, Egon L; Kutmon, Martina; Jaillard, Magali; Evelo, Chris T

    2015-08-23

    Biological pathways are descriptive diagrams of biological processes widely used for functional analysis of differentially expressed genes or proteins. Primary data analysis, such as quality control, normalisation, and statistical analysis, is often performed in scripting languages like R, Perl, and Python. Subsequent pathway analysis is usually performed using dedicated external applications. Workflows involving manual use of multiple environments are time consuming and error prone. Therefore, tools are needed that enable pathway analysis directly within the same scripting languages used for primary data analyses. Existing tools have limited capability in terms of available pathway content, pathway editing and visualisation options, and export file formats. Consequently, making the full-fledged pathway analysis tool PathVisio available from various scripting languages will benefit researchers. We developed PathVisioRPC, an XMLRPC interface for the pathway analysis software PathVisio. PathVisioRPC enables creating and editing biological pathways, visualising data on pathways, performing pathway statistics, and exporting results in several image formats in multiple programming environments. We demonstrate PathVisioRPC functionalities using examples in Python. Subsequently, we analyse a publicly available NCBI GEO gene expression dataset studying tumour bearing mice treated with cyclophosphamide in R. The R scripts demonstrate how calls to existing R packages for data processing and calls to PathVisioRPC can directly work together. To further support R users, we have created RPathVisio simplifying the use of PathVisioRPC in this environment. We have also created a pathway module for the microarray data analysis portal ArrayAnalysis.org that calls the PathVisioRPC interface to perform pathway analysis. This module allows users to use PathVisio functionality online without having to download and install the software and exemplifies how the PathVisioRPC interface can be used by data analysis pipelines for functional analysis of processed genomics data. PathVisioRPC enables data visualisation and pathway analysis directly from within various analytical environments used for preliminary analyses. It supports the use of existing pathways from WikiPathways or pathways created using the RPC itself. It also enables automation of tasks performed using PathVisio, making it useful to PathVisio users performing repeated visualisation and analysis tasks. PathVisioRPC is freely available for academic and commercial use at http://projects.bigcat.unimaas.nl/pathvisiorpc.

  3. Antigen Receptor Galaxy: A User-Friendly, Web-Based Tool for Analysis and Visualization of T and B Cell Receptor Repertoire Data

    PubMed Central

    IJspeert, Hanna; van Schouwenburg, Pauline A.; van Zessen, David; Pico-Knijnenburg, Ingrid

    2017-01-01

    Antigen Receptor Galaxy (ARGalaxy) is a Web-based tool for analyses and visualization of TCR and BCR sequencing data of 13 species. ARGalaxy consists of four parts: the demultiplex tool, the international ImMunoGeneTics information system (IMGT) concatenate tool, the immune repertoire pipeline, and the somatic hypermutation (SHM) and class switch recombination (CSR) pipeline. Together they allow the analysis of all different aspects of the immune repertoire. All pipelines can be run independently or combined, depending on the available data and the question of interest. The demultiplex tool allows data trimming and demultiplexing, whereas with the concatenate tool multiple IMGT/HighV-QUEST output files can be merged into a single file. The immune repertoire pipeline is an extended version of our previously published ImmunoGlobulin Galaxy (IGGalaxy) virtual machine that was developed to visualize V(D)J gene usage. It allows analysis of both BCR and TCR rearrangements, visualizes CDR3 characteristics (length and amino acid usage) and junction characteristics, and calculates the diversity of the immune repertoire. Finally, ARGalaxy includes the newly developed SHM and CSR pipeline to analyze SHM and/or CSR in BCR rearrangements. It analyzes the frequency and patterns of SHM, Ag selection (including BASELINe), clonality (Change-O), and CSR. The functionality of the ARGalaxy tool is illustrated in several clinical examples of patients with primary immunodeficiencies. In conclusion, ARGalaxy is a novel tool for the analysis of the complete immune repertoire, which is applicable to many patient groups with disturbances in the immune repertoire such as autoimmune diseases, allergy, and leukemia, but it can also be used to address basic research questions in repertoire formation and selection. PMID:28416602

  4. STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

    PubMed Central

    Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756

  5. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    PubMed

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

  6. A computational genomics pipeline for prokaryotic sequencing projects

    PubMed Central

    Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King

    2010-01-01

    Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285

  7. Comparative coal transportation costs: an economic and engineering analysis of truck, belt, rail, barge and coal slurry and pneumatic pipelines. Volume 3. Coal slurry pipelines. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rieber, M.; Soo, S.L.

    1977-08-01

    A coal slurry pipeline system requires that the coal go through a number of processing stages before it is used by the power plant. Once mined, the coal is delivered to a preparation plant where it is pulverized to sizes between 18 and 325 mesh and then suspended in about an equal weight of water. This 50-50 slurry mixture has a consistency approximating toothpaste. It is pushed through the pipeline via electric pumping stations 70 to 100 miles apart. Flow velocity through the line must be maintained within a narrow range. For example, if a 3.5 mph design is usedmore » at 5 mph, the system must be able to withstand double the horsepower, peak pressure, and wear. Minimum flowrate must be maintained to avoid particle settling and plugging. However, in general, once a pipeline system has been designed, because of economic considerations on the one hand and design limits on the other, flowrate is rather inflexible. Pipelines that have a slowly moving throughput and a water carrier may be subject to freezing in northern areas during periods of severe cold. One of the problems associated with slurry pipeline analyses is the lack of operating experience.« less

  8. Ultrasonic wave based pressure measurement in small diameter pipeline.

    PubMed

    Wang, Dan; Song, Zhengxiang; Wu, Yuan; Jiang, Yuan

    2015-12-01

    An effective non-intrusive method of ultrasound-based technique that allows monitoring liquid pressure in small diameter pipeline (less than 10mm) is presented in this paper. Ultrasonic wave could penetrate medium, through the acquisition of representative information from the echoes, properties of medium can be reflected. This pressure measurement is difficult due to that echoes' information is not easy to obtain in small diameter pipeline. The proposed method is a study on pipeline with Kneser liquid and is based on the principle that the transmission speed of ultrasonic wave in pipeline liquid correlates with liquid pressure and transmission speed of ultrasonic wave in pipeline liquid is reflected through ultrasonic propagation time providing that acoustic distance is fixed. Therefore, variation of ultrasonic propagation time can reflect variation of pressure in pipeline. Ultrasonic propagation time is obtained by electric processing approach and is accurately measured to nanosecond through high resolution time measurement module. We used ultrasonic propagation time difference to reflect actual pressure in this paper to reduce the environmental influences. The corresponding pressure values are finally obtained by acquiring the relationship between variation of ultrasonic propagation time difference and pressure with the use of neural network analysis method, the results show that this method is accurate and can be used in practice. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. An Optimization-Driven Analysis Pipeline to Uncover Biomarkers and Signaling Paths: Cervix Cancer.

    PubMed

    Lorenzo, Enery; Camacho-Caceres, Katia; Ropelewski, Alexander J; Rosas, Juan; Ortiz-Mojer, Michael; Perez-Marty, Lynn; Irizarry, Juan; Gonzalez, Valerie; Rodríguez, Jesús A; Cabrera-Rios, Mauricio; Isaza, Clara

    2015-06-01

    Establishing how a series of potentially important genes might relate to each other is relevant to understand the origin and evolution of illnesses, such as cancer. High-throughput biological experiments have played a critical role in providing information in this regard. A special challenge, however, is that of trying to conciliate information from separate microarray experiments to build a potential genetic signaling path. This work proposes a two-step analysis pipeline, based on optimization, to approach meta-analysis aiming to build a proxy for a genetic signaling path.

  10. SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification

    PubMed Central

    McClure, Matthew C.; McCarthy, John; Flynn, Paul; McClure, Jennifer C.; Dair, Emma; O'Connell, D. K.; Kearney, John F.

    2018-01-01

    A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method. PMID:29599798

  11. A graph-based approach for designing extensible pipelines

    PubMed Central

    2012-01-01

    Background In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps. Results We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms. Conclusions Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. PMID:22788675

  12. Single nucleotide polymorphism discovery via genotyping by sequencing to assess population genetic structure and recurrent polyploidization in Andropogon gerardii.

    PubMed

    McAllister, Christine A; Miller, Allison J

    2016-07-01

    Autopolyploidy, genome duplication within a single lineage, can result in multiple cytotypes within a species. Geographic distributions of cytotypes may reflect the evolutionary history of autopolyploid formation and subsequent population dynamics including stochastic (drift) and deterministic (differential selection among cytotypes) processes. Here, we used a population genomic approach to investigate whether autopolyploidy occurred once or multiple times in Andropogon gerardii, a widespread, North American grass with two predominant cytotypes. Genotyping by sequencing was used to identify single nucleotide polymorphisms (SNPs) in individuals collected from across the geographic range of A. gerardii. Two independent approaches to SNP calling were used: the reference-free UNEAK pipeline and a reference-guided approach based on the sequenced Sorghum bicolor genome. SNPs generated using these pipelines were analyzed independently with genetic distance and clustering. Analyses of the two SNP data sets showed very similar patterns of population-level clustering of A. gerardii individuals: a cluster of A. gerardii individuals from the southern Plains, a northern Plains cluster, and a western cluster. Groupings of individuals corresponded to geographic localities regardless of cytotype: 6x and 9x individuals from the same geographic area clustered together. SNPs generated using reference-guided and reference-free pipelines in A. gerardii yielded unique subsets of genomic data. Both data sets suggest that the 9x cytotype in A. gerardii likely evolved multiple times from 6x progenitors across the range of the species. Genomic approaches like GBS and diverse bioinformatics pipelines used here facilitate evolutionary analyses of complex systems with multiple ploidy levels. © 2016 Botanical Society of America.

  13. Magnetic pipeline for coal and oil

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Knolle, E.

    1998-07-01

    A 1994 analysis of the recorded costs of the Alaska oil pipeline, in a paper entitled Maglev Crude Oil Pipeline, (NASA CP-3247 pp. 671--684) concluded that, had the Knolle Magnetrans pipeline technology been available and used, some $10 million per day in transportation costs could have been saved over the 20 years of the Alaska oil pipeline's existence. This over 800 mile long pipeline requires about 500 horsepower per mile in pumping power, which together with the cost of the pipeline's capital investment consumes about one-third of the energy value of the pumped oil. This does not include the costmore » of getting the oil out of the ground. The reason maglev technology performs superior to conventional pipelines is because by magnetically levitating the oil into contact-free suspense, there is no drag-causing adhesion. In addition, by using permanent magnets in repulsion, suspension is achieved without using energy. Also, the pumped oil's adhesion to the inside of pipes limits its speed. In the case of the Alaska pipeline the speed is limited to about 7 miles per hour, which, with its 48-inch pipe diameter and 1200 psi pressure, pumps about 2 million barrels per day. The maglev system, as developed by Knolle Magnetrans, would transport oil in magnetically suspended sealed containers and, thus free of adhesion, at speeds 10 to 20 times faster. Furthermore, the diameter of the levitated containers can be made smaller with the same capacity, which makes the construction of the maglev system light and inexpensive. There are similar advantages when using maglev technology to transport coal. Also, a maglev system has advantages over railroads in mountainous regions where coal is primarily mined. A maglev pipeline can travel, all-year and all weather, in a straight line to the end-user, whereas railroads have difficult circuitous routes. In contrast, a maglev pipeline can climb over steep hills without much difficulty.« less

  14. Development of a Dmt Monitor for Statistical Tracking of Gravitational-Wave Burst Triggers Generated from the Omega Pipeline

    NASA Astrophysics Data System (ADS)

    Li, Jun-Wei; Cao, Jun-Wei

    2010-04-01

    One challenge in large-scale scientific data analysis is to monitor data in real-time in a distributed environment. For the LIGO (Laser Interferometer Gravitational-wave Observatory) project, a dedicated suit of data monitoring tools (DMT) has been developed, yielding good extensibility to new data type and high flexibility to a distributed environment. Several services are provided, including visualization of data information in various forms and file output of monitoring results. In this work, a DMT monitor, OmegaMon, is developed for tracking statistics of gravitational-wave (OW) burst triggers that are generated from a specific OW burst data analysis pipeline, the Omega Pipeline. Such results can provide diagnostic information as reference of trigger post-processing and interferometer maintenance.

  15. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

    DOE PAGES

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

    2016-02-24

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less

  16. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less

  17. Development of Time-Distance Helioseismology Data Analysis Pipeline for SDO/HMI

    NASA Technical Reports Server (NTRS)

    DuVall, T. L., Jr.; Zhao, J.; Couvidat, S.; Parchevsky, K. V.; Beck, J.; Kosovichev, A. G.; Scherrer, P. H.

    2008-01-01

    The Helioseismic and Magnetic Imager of SDO will provide uninterrupted 4k x 4k-pixel Doppler-shift images of the Sun with approximately 40 sec cadence. These data will have a unique potential for advancing local helioseismic diagnostics of the Sun's interior structure and dynamics. They will help to understand the basic mechanisms of solar activity and develop predictive capabilities for NASA's Living with a Star program. Because of the tremendous amount of data the HMI team is developing a data analysis pipeline, which will provide maps of subsurface flows and sound-speed distributions inferred form the Doppler data by the time-distance technique. We discuss the development plan, methods, and algorithms, and present the status of the pipeline, testing results and examples of the data products.

  18. Enhanced cortical thickness measurements for rodent brains via Lagrangian-based RK4 streamline computation

    NASA Astrophysics Data System (ADS)

    Lee, Joohwi; Kim, Sun Hyung; Oguz, Ipek; Styner, Martin

    2016-03-01

    The cortical thickness of the mammalian brain is an important morphological characteristic that can be used to investigate and observe the brain's developmental changes that might be caused by biologically toxic substances such as ethanol or cocaine. Although various cortical thickness analysis methods have been proposed that are applicable for human brain and have developed into well-validated open-source software packages, cortical thickness analysis methods for rodent brains have not yet become as robust and accurate as those designed for human brains. Based on a previously proposed cortical thickness measurement pipeline for rodent brain analysis,1 we present an enhanced cortical thickness pipeline in terms of accuracy and anatomical consistency. First, we propose a Lagrangian-based computational approach in the thickness measurement step in order to minimize local truncation error using the fourth-order Runge-Kutta method. Second, by constructing a line object for each streamline of the thickness measurement, we can visualize the way the thickness is measured and achieve sub-voxel accuracy by performing geometric post-processing. Last, with emphasis on the importance of an anatomically consistent partial differential equation (PDE) boundary map, we propose an automatic PDE boundary map generation algorithm that is specific to rodent brain anatomy, which does not require manual labeling. The results show that the proposed cortical thickness pipeline can produce statistically significant regions that are not observed in the previous cortical thickness analysis pipeline.

  19. Multiscale image analysis reveals structural heterogeneity of the cell microenvironment in homotypic spheroids.

    PubMed

    Schmitz, Alexander; Fischer, Sabine C; Mattheyer, Christian; Pampaloni, Francesco; Stelzer, Ernst H K

    2017-03-03

    Three-dimensional multicellular aggregates such as spheroids provide reliable in vitro substitutes for tissues. Quantitative characterization of spheroids at the cellular level is fundamental. We present the first pipeline that provides three-dimensional, high-quality images of intact spheroids at cellular resolution and a comprehensive image analysis that completes traditional image segmentation by algorithms from other fields. The pipeline combines light sheet-based fluorescence microscopy of optically cleared spheroids with automated nuclei segmentation (F score: 0.88) and concepts from graph analysis and computational topology. Incorporating cell graphs and alpha shapes provided more than 30 features of individual nuclei, the cellular neighborhood and the spheroid morphology. The application of our pipeline to a set of breast carcinoma spheroids revealed two concentric layers of different cell density for more than 30,000 cells. The thickness of the outer cell layer depends on a spheroid's size and varies between 50% and 75% of its radius. In differently-sized spheroids, we detected patches of different cell densities ranging from 5 × 10 5 to 1 × 10 6  cells/mm 3 . Since cell density affects cell behavior in tissues, structural heterogeneities need to be incorporated into existing models. Our image analysis pipeline provides a multiscale approach to obtain the relevant data for a system-level understanding of tissue architecture.

  20. PIG's Speed Estimated with Pressure Transducers and Hall Effect Sensor: An Industrial Application of Sensors to Validate a Testing Laboratory.

    PubMed

    Lima, Gustavo F; Freitas, Victor C G; Araújo, Renan P; Maitelli, André L; Salazar, Andrés O

    2017-09-15

    The pipeline inspection using a device called Pipeline Inspection Gauge (PIG) is safe and reliable when the PIG is at low speeds during inspection. We built a Testing Laboratory, containing a testing loop and supervisory system to study speed control techniques for PIGs. The objective of this work is to present and validate the Testing Laboratory, which will allow development of a speed controller for PIGs and solve an existing problem in the oil industry. The experimental methodology used throughout the project is also presented. We installed pressure transducers on pipeline outer walls to detect the PIG's movement and, with data from supervisory, calculated an average speed of 0.43 m/s. At the same time, the electronic board inside the PIG received data from odometer and calculated an average speed of 0.45 m/s. We found an error of 4.44%, which is experimentally acceptable. The results showed that it is possible to successfully build a Testing Laboratory to detect the PIG's passage and estimate its speed. The validation of the Testing Laboratory using data from the odometer and its auxiliary electronic was very successful. Lastly, we hope to develop more research in the oil industry area using this Testing Laboratory.

  1. Contributions to modeling functionality of a high frequency damper system

    NASA Astrophysics Data System (ADS)

    Sirbu, E. A.; Horga, S.; Vrabioiu, G.

    2016-08-01

    Due to the necessity of improving the handling performances of a motor vehicle, it is imperative to understand the suspensions properties that affects ride and directional respons.The construction of a fero-magnetic shock absorber is based on two bellows interconnected by a pipe-line. Through this pipe-line the fero-magnetic fluid is carried between the two bellows. The damping characteristic of the shock absorber is affected by the viscosity of the fero-magnetic fluid. The viscosity of the fluid, is controlled through a electric coil mounted on the bellows connecting pipe-line. Modifying the electrical field of the coil, the viscosity of the fluid will change, finally affecting the damping characteristic of the shock absorber. A recent system called „CCD Pothole Suspension” is implemented on Ford vehicles. By modifying the dampning characteristic of the shock absorbers, vehicle daynamics can be improved; also the risk of damaging the suspension will be decreased. The approach of this paper is to analyze the behaviour of the fero magnetic damper, thus determining how it will affect the performances of the vehicle suspensions. The experimental research will provide a better understanding of the behavior of the fero-magnetic shock absorber, and the possible advantages of using this system.

  2. PIG’s Speed Estimated with Pressure Transducers and Hall Effect Sensor: An Industrial Application of Sensors to Validate a Testing Laboratory

    PubMed Central

    Freitas, Victor C. G.; Araújo, Renan P.; Maitelli, André L.; Salazar, Andrés O.

    2017-01-01

    The pipeline inspection using a device called Pipeline Inspection Gauge (PIG) is safe and reliable when the PIG is at low speeds during inspection. We built a Testing Laboratory, containing a testing loop and supervisory system to study speed control techniques for PIGs. The objective of this work is to present and validate the Testing Laboratory, which will allow development of a speed controller for PIGs and solve an existing problem in the oil industry. The experimental methodology used throughout the project is also presented. We installed pressure transducers on pipeline outer walls to detect the PIG’s movement and, with data from supervisory, calculated an average speed of 0.43 m/s. At the same time, the electronic board inside the PIG received data from odometer and calculated an average speed of 0.45 m/s. We found an error of 4.44%, which is experimentally acceptable. The results showed that it is possible to successfully build a Testing Laboratory to detect the PIG’s passage and estimate its speed. The validation of the Testing Laboratory using data from the odometer and its auxiliary electronic was very successful. Lastly, we hope to develop more research in the oil industry area using this Testing Laboratory. PMID:28914757

  3. Kepler Data Validation Time Series File: Description of File Format and Content

    NASA Technical Reports Server (NTRS)

    Mullally, Susan E.

    2016-01-01

    The Kepler space mission searches its time series data for periodic, transit-like signatures. The ephemerides of these events, called Threshold Crossing Events (TCEs), are reported in the TCE tables at the NASA Exoplanet Archive (NExScI). Those TCEs are then further evaluated to create planet candidates and populate the Kepler Objects of Interest (KOI) table, also hosted at the Exoplanet Archive. The search, evaluation and export of TCEs is performed by two pipeline modules, TPS (Transit Planet Search) and DV (Data Validation). TPS searches for the strongest, believable signal and then sends that information to DV to fit a transit model, compute various statistics, and remove the transit events so that the light curve can be searched for other TCEs. More on how this search is done and on the creation of the TCE table can be found in Tenenbaum et al. (2012), Seader et al. (2015), Jenkins (2002). For each star with at least one TCE, the pipeline exports a file that contains the light curves used by TPS and DV to find and evaluate the TCE(s). This document describes the content of these DV time series files, and this introduction provides a bit of context for how the data in these files are used by the pipeline.

  4. VIV analysis of pipelines under complex span conditions

    NASA Astrophysics Data System (ADS)

    Wang, James; Steven Wang, F.; Duan, Gang; Jukes, Paul

    2009-06-01

    Spans occur when a pipeline is laid on a rough undulating seabed or when upheaval buckling occurs due to constrained thermal expansion. This not only results in static and dynamic loads on the flowline at span sections, but also generates vortex induced vibration (VIV), which can lead to fatigue issues. The phenomenon, if not predicted and controlled properly, will negatively affect pipeline integrity, leading to expensive remediation and intervention work. Span analysis can be complicated by: long span lengths, a large number of spans caused by a rough seabed, and multi-span interactions. In addition, the complexity can be more onerous and challenging when soil uncertainty, concrete degradation and unknown residual lay tension are considered in the analysis. This paper describes the latest developments and a ‘state-of-the-art’ finite element analysis program that has been developed to simulate the span response of a flowline under complex boundary and loading conditions. Both VIV and direct wave loading are captured in the analysis and the results are sequentially used for the ultimate limit state (ULS) check and fatigue life calculation.

  5. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  6. Computerized image analysis for quantitative neuronal phenotyping in zebrafish.

    PubMed

    Liu, Tianming; Lu, Jianfeng; Wang, Ye; Campbell, William A; Huang, Ling; Zhu, Jinmin; Xia, Weiming; Wong, Stephen T C

    2006-06-15

    An integrated microscope image analysis pipeline is developed for automatic analysis and quantification of phenotypes in zebrafish with altered expression of Alzheimer's disease (AD)-linked genes. We hypothesize that a slight impairment of neuronal integrity in a large number of zebrafish carrying the mutant genotype can be detected through the computerized image analysis method. Key functionalities of our zebrafish image processing pipeline include quantification of neuron loss in zebrafish embryos due to knockdown of AD-linked genes, automatic detection of defective somites, and quantitative measurement of gene expression levels in zebrafish with altered expression of AD-linked genes or treatment with a chemical compound. These quantitative measurements enable the archival of analyzed results and relevant meta-data. The structured database is organized for statistical analysis and data modeling to better understand neuronal integrity and phenotypic changes of zebrafish under different perturbations. Our results show that the computerized analysis is comparable to manual counting with equivalent accuracy and improved efficacy and consistency. Development of such an automated data analysis pipeline represents a significant step forward to achieve accurate and reproducible quantification of neuronal phenotypes in large scale or high-throughput zebrafish imaging studies.

  7. Pressurizing the STEM Pipeline: An Expectancy-Value Theory Analysis of Youths' STEM Attitudes

    ERIC Educational Resources Information Center

    Ball, Christopher; Huang, Kuo-Ting; Cotten, Shelia R.; Rikard, R. V.

    2017-01-01

    Over the past decade, there has been a strong national push to increase minority students' positive attitudes towards STEM-related careers. However, despite this focus, minority students have remained underrepresented in these fields. Some researchers have directed their attention towards improving the STEM pipeline which carries students through…

  8. Landslide hazard analysis for pipelines: The case of the Simonette river crossing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grivas, D.A.; Schultz, B.C.; O`Neil, G.

    1995-12-31

    The overall objective of this study is to develop a probabilistic methodology to analyze landslide hazards and their effects on the safety of buried pipelines. The methodology incorporates a range of models that can accommodate differences in the ground movement modes and the amount and type of information available at various site locations. Two movement modes are considered, namely (a) instantaneous (catastrophic) slides, and (b) gradual ground movement which may result in cumulative displacements over the pipeline design life (30--40 years) that are in excess of allowable values. Probabilistic analysis is applied in each case to address the uncertainties associatedmore » with important factors that control slope stability. Availability of information ranges from relatively well studied, instrumented installations to cases where data is limited to what can be derived from topographic and geologic maps. The methodology distinguishes between procedures applied where there is little information and those that can be used when relatively extensive data is available. important aspects of the methodology are illustrated in a case study involving a pipeline located in Northern Alberta, Canada, in the Simonette river valley.« less

  9. SAMSA2: a standalone metatranscriptome analysis pipeline.

    PubMed

    Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G

    2018-05-21

    Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

  10. Clinical detection of deletion structural variants in whole-genome sequences

    PubMed Central

    Noll, Aaron C; Miller, Neil A; Smith, Laurie D; Yoo, Byunggil; Fiedler, Stephanie; Cooley, Linda D; Willig, Laurel K; Petrikin, Josh E; Cakici, Julie; Lesko, John; Newton, Angela; Detherage, Kali; Thiffault, Isabelle; Saunders, Carol J; Farrow, Emily G; Kingsmore, Stephen F

    2016-01-01

    Optimal management of acutely ill infants with monogenetic diseases requires rapid identification of causative haplotypes. Whole-genome sequencing (WGS) has been shown to identify pathogenic nucleotide variants in such infants. Deletion structural variants (DSVs, >50 nt) are implicated in many genetic diseases, and tools have been designed to identify DSVs using short-read WGS. Optimisation and integration of these tools into a WGS pipeline could improve diagnostic sensitivity and specificity of WGS. In addition, it may improve turnaround time when compared with current CNV assays, enhancing utility in acute settings. Here we describe DSV detection methods for use in WGS for rapid diagnosis in acutely ill infants: SKALD (Screening Konsensus and Annotation of Large Deletions) combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules. In four WGS runs, the average analytic precision (positive predictive value) of SKALD was 78%, and recall (sensitivity) was 27%, when compared with validated reference DSV calls. When retrospectively applied to a cohort of 36 families with acutely ill infants SKALD identified causative DSVs in two. The first was heterozygous deletion of exons 1–3 of MMP21 in trans with a heterozygous frame-shift deletion in two siblings with transposition of the great arteries and heterotaxy. In a newborn female with dysmorphic features, ventricular septal defect and persistent pulmonary hypertension, SKALD identified the breakpoints of a heterozygous, de novo 1p36.32p36.13 deletion. In summary, consensus DSV calling, implemented in an 8-h computational pipeline with parameterised filtering, has the potential to increase the diagnostic yield of WGS in acutely ill neonates and discover novel disease genes. PMID:29263817

  11. Extending the Fermi-LAT data processing pipeline to the grid

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zimmer, S.; Arrabito, L.; Glanzman, T.

    2015-05-12

    The Data Handling Pipeline ("Pipeline") has been developed for the Fermi Gamma-Ray Space Telescope (Fermi) Large Area Telescope (LAT) which launched in June 2008. Since then it has been in use to completely automate the production of data quality monitoring quantities, reconstruction and routine analysis of all data received from the satellite and to deliver science products to the collaboration and the Fermi Science Support Center. Aside from the reconstruction of raw data from the satellite (Level 1), data reprocessing and various event-level analyses are also reasonably heavy loads on the pipeline and computing resources. These other loads, unlike Levelmore » 1, can run continuously for weeks or months at a time. Additionally, it receives heavy use in performing production Monte Carlo tasks.« less

  12. Creating Data that Never Die: Building a Spectrograph Data Pipeline in the Virtual Observatory Era

    NASA Astrophysics Data System (ADS)

    Mink, D. J.; Wyatt, W. F.; Roll, J. B.; Tokarz, S. P.; Conroy, M. A.; Caldwell, N.; Kurtz, M.; Geller, M. J.

    2005-12-01

    Data pipelines for modern complex astronomical instruments do not begin when the data is taken and end when it is delivered to the user. Information must flow between the observatory and the observer from the time a project is conceived and between the observatory and the world well past the time when the original observers have extracted all the information they want from the data. For the 300-fiber Hectospec low dispersion spectrograph on the MMT, the SAO Telescope Data Center is constructing a data pipeline which provides assistance from preparing and submitting observing proposals through observation, reduction, and analysis to publication and an afterlife in the Virtual Observatory. We will describe our semi-automatic pipeline and how it has evolved over the first nine months of operation.

  13. Biocorrosive activity analysis of the oil pipeline soil in the Khanty-Mansiysk Autonomous Region of Ugra and the Krasnodar Territory of the Russian Federation

    NASA Astrophysics Data System (ADS)

    Chesnokova, M. G.; Shalay, V. V.; Kriga, A. S.

    2017-08-01

    The purpose of the study was to assess the biocorrosive activity of oil pipeline soil in the Khanty-Mansiysk Autonomous Region of Yugra and the Krasnodar Territory of the Russian Federation, due to the action of a complex of factors and analysis of sulfate-reducing and thionic bacteria content. The number of bacteria in the sulfur cycle (autotrophic thionic and sulfate-reducing bacteria), the total concentration of sulfur and iron in soil samples adjacent to the surface of underground pipelines, the specific electrical resistivity of the soil was determined. A criterion for the biocorrosive activity of the soil (CBA) was established. The study of the biocorrosive activity of the soil has established its features in the area of the oil pipeline construction in the compared territories. In the soil of the Krasnodar Territory pipeline, aggressive samples were recorded in 5.75% of cases, samples with moderate aggressiveness (49.43%), with weak soil aggressiveness (42.53% of cases), and samples with potential aggressiveness (2.30%). On the territory of the Khanty-Mansiysk Autonomous Region of Yugra, samples with weak soil aggressiveness prevailed (55.17% of cases), with moderate aggressiveness (34.5% of cases). When carrying out multiple regression analysis in the system of variables "factors of soil biocorrosive activity", informative data of modeling the indicator "the content of thiobacteria in soil" was established. The results of the research show the need for dynamic monitoring and the development of preventive measures to prevent biocorrosion.

  14. Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

    NASA Astrophysics Data System (ADS)

    Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

    2012-02-01

    The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.

  15. Seismic hazard evaluation of the Oman India pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, K.W.; Thenhaus, P.C.; Mullee, J.E.

    1996-12-31

    The proposed Oman India pipeline will traverse approximately 1,135 km of the northern Arabian Sea floor and adjacent continental shelves at depths of over 3 km on its route from Ra`s al Jifan, Oman, to Rapar Gadhwali, India. The western part of the route crosses active faults that form the transform boundary between the Arabian and Indian tectonic plates. The eastern terminus of the route lies in the vicinity of the great (M {approximately} 8) 1829 Kutch, India earthquake. A probabilistic seismic hazard analysis was used to estimate the values of peak ground acceleration (PGA) with return periods of 200,more » 500 and 1,000 years at selected locations along the pipeline route and the submarine Indus Canyon -- a possible source of large turbidity flows. The results defined the ground-shaking hazard along the pipeline route and Indus Canyon for evaluation of risks to the pipeline from potential earthquake-induced geologic hazards such as liquefaction, slope instability, and turbidity flows. 44 refs.« less

  16. pyAmpli: an amplicon-based variant filter pipeline for targeted resequencing data.

    PubMed

    Beyens, Matthias; Boeckx, Nele; Van Camp, Guy; Op de Beeck, Ken; Vandeweyer, Geert

    2017-12-14

    Haloplex targeted resequencing is a popular method to analyze both germline and somatic variants in gene panels. However, involved wet-lab procedures may introduce false positives that need to be considered in subsequent data-analysis. No variant filtering rationale addressing amplicon enrichment related systematic errors, in the form of an all-in-one package, exists to our knowledge. We present pyAmpli, a platform independent parallelized Python package that implements an amplicon-based germline and somatic variant filtering strategy for Haloplex data. pyAmpli can filter variants for systematic errors by user pre-defined criteria. We show that pyAmpli significantly increases specificity, without reducing sensitivity, essential for reporting true positive clinical relevant mutations in gene panel data. pyAmpli is an easy-to-use software tool which increases the true positive variant call rate in targeted resequencing data. It specifically reduces errors related to PCR-based enrichment of targeted regions.

  17. Measuring The cmb Polarization At 94 GHz With The QUIET Pseudo-cL Pipeline

    NASA Astrophysics Data System (ADS)

    Buder, Immanuel; QUIET Collaboration

    2012-01-01

    The Q/U Imaging ExperimenT (QUIET) aims to limit or detect cosmic microwave background (CMB) B-mode polarization from inflation. This talk is part of a 3-talk series on QUIET. The previous talk describes the QUIET science and instrument. QUIET has two parallel analysis pipelines which are part of an effort to validate the analysis and confirm the result. In this talk, I will describe the analysis methods of one of these: the pseudo-Cl pipeline. Calibration, noise modeling, filtering, and data-selection choices are made following a blind-analysis strategy. Central to this strategy is a suite of 30 null tests, each motivated by a possible instrumental problem or systematic effect. The systematic errors are also evaluated through full-season simulations in the blind stage of the analysis before the result is known. The CMB power spectra are calculated using a pseudo-Cl cross-correlation technique which suppresses contamination and makes the result insensitive to noise bias. QUIET will detect the first three peaks of the even-parity (E-mode) spectrum at high significance. I will show forecasts of the systematic errors for these results and for the upper limit on B-mode polarization. The very low systematic errors in these forecasts show that the technology is ready to be applied in a more sensitive next-generation experiment. The next and final talk in this series covers the other parallel analysis pipeline, based on maximum likelihood methods. This work was supported by NSF and the Department of Education.

  18. CRISPRED: CRISP imaging spectropolarimeter data reduction pipeline

    NASA Astrophysics Data System (ADS)

    de la Cruz Rodríguez, J.; Löfdahl, M. G.; Sütterlin, P.; Hillberg, T.; Rouppe van der Voort, L.

    2017-08-01

    CRISPRED reduces data from the CRISP imaging spectropolarimeter at the Swedish 1 m Solar Telescope (SST). It performs fitting routines, corrects optical aberrations from atmospheric turbulence as well as from the optics, and compensates for inter-camera misalignments, field-dependent and time-varying instrumental polarization, and spatial variation in the detector gain and in the zero level offset (bias). It has an object-oriented IDL structure with computationally demanding routines performed in C subprograms called as dynamically loadable modules (DLMs).

  19. CallFUSE Version 3: A Data Reduction Pipeline for the Far Ultraviolet Spectroscopic Explorer

    DTIC Science & Technology

    2007-05-01

    Earth orbit with an inclination of 25 to the equator and an approximately 100 minute orbital period. Data obtained with the instrument are reduced...throughout the mis- sion reveal that the gratings’ orbital motion depends on three parameters: beta angle (the angle between the target and the anti- Sun ...University, Bal- timore, MD; wvd@pha.jhu.edu. 3 Space Telescope Science Institute, ESS/SSG, Baltimore, MD. 4 Current address: Earth Orientation Department

  20. IN13B-1660: Analytics and Visualization Pipelines for Big Data on the NASA Earth Exchange (NEX) and OpenNEX

    NASA Technical Reports Server (NTRS)

    Chaudhary, Aashish; Votava, Petr; Nemani, Ramakrishna R.; Michaelis, Andrew; Kotfila, Chris

    2016-01-01

    We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.

  1. Analytics and Visualization Pipelines for Big ­Data on the NASA Earth Exchange (NEX) and OpenNEX

    NASA Astrophysics Data System (ADS)

    Chaudhary, A.; Votava, P.; Nemani, R. R.; Michaelis, A.; Kotfila, C.

    2016-12-01

    We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.

  2. One target-multiple indications: a call for an integrated common mechanisms strategy.

    PubMed

    Nielsch, Ulrich; Schäfer, Stefan; Wild, Hanno; Busch, Andreas

    2007-12-01

    Ever-increasing research and development costs are putting constant pressure on the pharmaceutical industry to improve their efficiency. Efforts to increase the output of the research pipeline have yielded limited success. Traditionally, maximization of the value of a drug is attempted through life-cycle management, which is initiated late in development, or when the drug is already on the market. Validated targets can be exploited further through development of a follow-up drug, which may offer advantages regarding safety or convenience. In this article, we propose to systematically evaluate the full therapeutic potential of a drug target, proprietary chemical lead structure, or drug candidate as broad and as early as possible and we call this the 'common mechanism' approach.

  3. Optimizing Preprocessing and Analysis Pipelines for Single-Subject FMRI. I. Standard Temporal Motion and Physiological Noise Correction Methods

    PubMed Central

    Churchill, Nathan W.; Oder, Anita; Abdi, Hervé; Tam, Fred; Lee, Wayne; Thomas, Christopher; Ween, Jon E.; Graham, Simon J.; Strother, Stephen C.

    2016-01-01

    Subject-specific artifacts caused by head motion and physiological noise are major confounds in BOLD fMRI analyses. However, there is little consensus on the optimal choice of data preprocessing steps to minimize these effects. To evaluate the effects of various preprocessing strategies, we present a framework which comprises a combination of (1) nonparametric testing including reproducibility and prediction metrics of the data-driven NPAIRS framework (Strother et al. [2002]: NeuroImage 15:747–771), and (2) intersubject comparison of SPM effects, using DISTATIS (a three-way version of metric multidimensional scaling (Abdi et al. [2009]: NeuroImage 45:89–95). It is shown that the quality of brain activation maps may be significantly limited by sub-optimal choices of data preprocessing steps (or “pipeline”) in a clinical task-design, an fMRI adaptation of the widely used Trail-Making Test. The relative importance of motion correction, physiological noise correction, motion parameter regression, and temporal detrending were examined for fMRI data acquired in young, healthy adults. Analysis performance and the quality of activation maps were evaluated based on Penalized Discriminant Analysis (PDA). The relative importance of different preprocessing steps was assessed by (1) a nonparametric Friedman rank test for fixed sets of preprocessing steps, applied to all subjects; and (2) evaluating pipelines chosen specifically for each subject. Results demonstrate that preprocessing choices have significant, but subject-dependant effects, and that individually-optimized pipelines may significantly improve the reproducibility of fMRI results over fixed pipelines. This was demonstrated by the detection of a significant interaction with motion parameter regression and physiological noise correction, even though the range of subject head motion was small across the group (≪ 1 voxel). Optimizing pipelines on an individual-subject basis also revealed brain activation patterns either weak or absent under fixed pipelines, which has implications for the overall interpretation of fMRI data, and the relative importance of preprocessing methods. PMID:21455942

  4. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.

  5. Open source pipeline for ESPaDOnS reduction and analysis

    NASA Astrophysics Data System (ADS)

    Martioli, Eder; Teeple, Doug; Manset, Nadine; Devost, Daniel; Withington, Kanoa; Venne, Andre; Tannock, Megan

    2012-09-01

    OPERA is a Canada-France-Hawaii Telescope (CFHT) open source collaborative software project currently under development for an ESPaDOnS echelle spectro-polarimetric image reduction pipeline. OPERA is designed to be fully automated, performing calibrations and reduction, producing one-dimensional intensity and polarimetric spectra. The calibrations are performed on two-dimensional images. Spectra are extracted using an optimal extraction algorithm. While primarily designed for CFHT ESPaDOnS data, the pipeline is being written to be extensible to other echelle spectrographs. A primary design goal is to make use of fast, modern object-oriented technologies. Processing is controlled by a harness, which manages a set of processing modules, that make use of a collection of native OPERA software libraries and standard external software libraries. The harness and modules are completely parametrized by site configuration and instrument parameters. The software is open- ended, permitting users of OPERA to extend the pipeline capabilities. All these features have been designed to provide a portable infrastructure that facilitates collaborative development, code re-usability and extensibility. OPERA is free software with support for both GNU/Linux and MacOSX platforms. The pipeline is hosted on SourceForge under the name "opera-pipeline".

  6. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

    PubMed

    Chan, Kuang-Lim; Rosli, Rozana; Tatarinova, Tatiana V; Hogan, Michael; Firdaus-Raih, Mohd; Low, Eng-Ti Leslie

    2017-01-27

    Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

  7. Numerical Analysis of Flow-Induced Vibrations in Closed Side Branches

    NASA Astrophysics Data System (ADS)

    KníŽat, Branislav; Troják, Michal

    2011-12-01

    Vibrations occuring in closed side branches connected to a main pipe are a frequent problem during pipeline system operation. At the design stage of pipeline systems, this problem is sometimes overlooked or underestimated which can later lead to the shortening of the systems life cycle or may even cause injury. The aim of this paper is a numerical analysis of the start of self-induced vibrations on the edge of a closed side branch. Calculation conditions and obtained results are presented within.

  8. Sensitivity Analysis of Fatigue Crack Growth Model for API Steels in Gaseous Hydrogen.

    PubMed

    Amaro, Robert L; Rustagi, Neha; Drexler, Elizabeth S; Slifka, Andrew J

    2014-01-01

    A model to predict fatigue crack growth of API pipeline steels in high pressure gaseous hydrogen has been developed and is presented elsewhere. The model currently has several parameters that must be calibrated for each pipeline steel of interest. This work provides a sensitivity analysis of the model parameters in order to provide (a) insight to the underlying mathematical and mechanistic aspects of the model, and (b) guidance for model calibration of other API steels.

  9. HiCUP: pipeline for mapping and processing Hi-C data.

    PubMed

    Wingett, Steven; Ewels, Philip; Furlan-Magaril, Mayra; Nagano, Takashi; Schoenfelder, Stefan; Fraser, Peter; Andrews, Simon

    2015-01-01

    HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.

  10. Best practices for evaluating single nucleotide variant calling methods for microbial genomics

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T.; Sahl, Jason W.; Schupp, James M.; Keim, Paul; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.

    2015-01-01

    Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards. PMID:26217378

  11. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  12. Quantitative risk assessment of CO2 transport by pipelines--a review of uncertainties and their impacts.

    PubMed

    Koornneef, Joris; Spruijt, Mark; Molag, Menso; Ramírez, Andrea; Turkenburg, Wim; Faaij, André

    2010-05-15

    A systematic assessment, based on an extensive literature review, of the impact of gaps and uncertainties on the results of quantitative risk assessments (QRAs) for CO(2) pipelines is presented. Sources of uncertainties that have been assessed are: failure rates, pipeline pressure, temperature, section length, diameter, orifice size, type and direction of release, meteorological conditions, jet diameter, vapour mass fraction in the release and the dose-effect relationship for CO(2). A sensitivity analysis with these parameters is performed using release, dispersion and impact models. The results show that the knowledge gaps and uncertainties have a large effect on the accuracy of the assessed risks of CO(2) pipelines. In this study it is found that the individual risk contour can vary between 0 and 204 m from the pipeline depending on assumptions made. In existing studies this range is found to be between <1m and 7.2 km. Mitigating the relevant risks is part of current practice, making them controllable. It is concluded that QRA for CO(2) pipelines can be improved by validation of release and dispersion models for high-pressure CO(2) releases, definition and adoption of a universal dose-effect relationship and development of a good practice guide for QRAs for CO(2) pipelines. Copyright (c) 2009 Elsevier B.V. All rights reserved.

  13. Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline

    PubMed Central

    Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur

    2010-01-01

    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408

  14. Assessment of ecological and human health risks of heavy metal contamination in agriculture soils disturbed by pipeline construction.

    PubMed

    Shi, Peng; Xiao, Jun; Wang, Yafeng; Chen, Liding

    2014-02-28

    The construction of large-scale infrastructures such as nature gas/oil pipelines involves extensive disturbance to regional ecosystems. Few studies have documented the soil degradation and heavy metal contamination caused by pipeline construction. In this study, chromium (Cr), cadmium (Cd), copper (Cu), nickel (Ni), lead (Pb) and zinc (Zn) levels were evaluated using Index of Geo-accumulation (Igeo) and Potential Ecological Risk Index (RI) values, and human health risk assessments were used to elucidate the level and spatial variation of heavy metal pollution risks. The results showed that the impact zone of pipeline installation on soil heavy metal contamination was restricted to pipeline right-of-way (RoW), which had higher Igeo of Cd, Cu, Ni and Pb than that of 20 m and 50 m. RI showed a declining tendency in different zones as follows: trench > working zone > piling area > 20 m > 50 m. Pipeline RoW resulted in higher human health risks than that of 20 m and 50 m, and children were more susceptible to non-carcinogenic hazard risk. Cluster analysis showed that Cu, Ni, Pb and Cd had similar sources, drawing attention to the anthropogenic activity. The findings in this study should help better understand the type, degree, scope and sources of heavy metal pollution from pipeline construction to reduce pollutant emissions, and are helpful in providing a scientific basis for future risk management.

  15. Domain atrophy creates rare cases of functional partial protein domains.

    PubMed

    Prakash, Ananth; Bateman, Alex

    2015-04-30

    Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

    The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variablemore » objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.« less

  17. Clustering-based Feature Learning on Variable Stars

    NASA Astrophysics Data System (ADS)

    Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

    2016-04-01

    The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.

  18. Facilitating admissions of diverse students: A six-point, evidence-informed framework for pipeline and program development.

    PubMed

    Young, Meredith E; Thomas, Aliki; Varpio, Lara; Razack, Saleem I; Hanson, Mark D; Slade, Steve; Dayem, Katharine L; McKnight, David J

    2017-04-01

    Several national level calls have encouraged reconsideration of diversity issues in medical education. Particular interest has been placed on admissions, as decisions made here shape the nature of the future physician workforce. Critical analysis of current practices paired with evidence-informed policies may counter some of the barriers impeding access for underrepresented groups. We present a framework for diversity-related program development and evaluation grounded within a knowledge translation framework, and supported by the initiation of longitudinal collection of diversity-related data. We provide an illustrative case study for each component of the framework. Descriptive analyses are presented of pre/post intervention diversity metrics if applicable and available. The framework's focal points are: 1) data-driven identification of underrepresented groups, 2) pipeline development and targeted recruitment, 3) ensuring an inclusive process, 4) ensuring inclusive assessment, 5) ensuring inclusive selection, and 6) iterative use of diversity-related data. Case studies ranged from wording changes on admissions websites to the establishment of educational and administrative offices addressing needs of underrepresented populations. We propose that diversity-related data must be collected on a variety of markers, developed in partnership with stakeholders who are most likely to facilitate implementation of best practices and new policies. These data can facilitate the design, implementation, and evaluation of evidence-informed diversity initiatives and provide a structure for continued investigation into 'interventions' supporting diversity-related initiatives.

  19. BLINKER: Automated Extraction of Ocular Indices from EEG Enabling Large-Scale Analysis.

    PubMed

    Kleifges, Kelly; Bigdely-Shamlo, Nima; Kerick, Scott E; Robbins, Kay A

    2017-01-01

    Electroencephalography (EEG) offers a platform for studying the relationships between behavioral measures, such as blink rate and duration, with neural correlates of fatigue and attention, such as theta and alpha band power. Further, the existence of EEG studies covering a variety of subjects and tasks provides opportunities for the community to better characterize variability of these measures across tasks and subjects. We have implemented an automated pipeline (BLINKER) for extracting ocular indices such as blink rate, blink duration, and blink velocity-amplitude ratios from EEG channels, EOG channels, and/or independent components (ICs). To illustrate the use of our approach, we have applied the pipeline to a large corpus of EEG data (comprising more than 2000 datasets acquired at eight different laboratories) in order to characterize variability of certain ocular indicators across subjects. We also investigate dependence of ocular indices on task in a shooter study. We have implemented our algorithms in a freely available MATLAB toolbox called BLINKER. The toolbox, which is easy to use and can be applied to collections of data without user intervention, can automatically discover which channels or ICs capture blinks. The tools extract blinks, calculate common ocular indices, generate a report for each dataset, dump labeled images of the individual blinks, and provide summary statistics across collections. Users can run BLINKER as a script or as a plugin for EEGLAB. The toolbox is available at https://github.com/VisLab/EEG-Blinks. User documentation and examples appear at http://vislab.github.io/EEG-Blinks/.

  20. Head-to-Head Comparison of Two Popular Cortical Thickness Extraction Algorithms: A Cross-Sectional and Longitudinal Study

    PubMed Central

    Redolfi, Alberto; Manset, David; Barkhof, Frederik; Wahlund, Lars-Olof; Glatard, Tristan; Mangin, Jean-François; Frisoni, Giovanni B.

    2015-01-01

    Background and Purpose The measurement of cortical shrinkage is a candidate marker of disease progression in Alzheimer’s. This study evaluated the performance of two pipelines: Civet-CLASP (v1.1.9) and Freesurfer (v5.3.0). Methods Images from 185 ADNI1 cases (69 elderly controls (CTR), 37 stable MCI (sMCI), 27 progressive MCI (pMCI), and 52 Alzheimer (AD) patients) scanned at baseline, month 12, and month 24 were processed using the two pipelines and two interconnected e-infrastructures: neuGRID (https://neugrid4you.eu) and VIP (http://vip.creatis.insa-lyon.fr). The vertex-by-vertex cross-algorithm comparison was made possible applying the 3D gradient vector flow (GVF) and closest point search (CPS) techniques. Results The cortical thickness measured with Freesurfer was systematically lower by one third if compared to Civet’s. Cross-sectionally, Freesurfer’s effect size was significantly different in the posterior division of the temporal fusiform cortex. Both pipelines were weakly or mildly correlated with the Mini Mental State Examination score (MMSE) and the hippocampal volumetry. Civet differed significantly from Freesurfer in large frontal, parietal, temporal and occipital regions (p<0.05). In a discriminant analysis with cortical ROIs having effect size larger than 0.8, both pipelines gave no significant differences in area under the curve (AUC). Longitudinally, effect sizes were not significantly different in any of the 28 ROIs tested. Both pipelines weakly correlated with MMSE decay, showing no significant differences. Freesurfer mildly correlated with hippocampal thinning rate and differed in the supramarginal gyrus, temporal gyrus, and in the lateral occipital cortex compared to Civet (p<0.05). In a discriminant analysis with ROIs having effect size larger than 0.6, both pipelines yielded no significant differences in the AUC. Conclusions Civet appears slightly more sensitive to the typical AD atrophic pattern at the MCI stage, but both pipelines can accurately characterize the topography of cortical thinning at the dementia stage. PMID:25781983

  1. Disrupting the Carceral State through Education Journey Mapping

    ERIC Educational Resources Information Center

    Annamma, Subini

    2016-01-01

    The School-to-Prison Pipeline is an alarming trend of funneling children of color out of schools and into incarceration. Yet the focus on the Pipeline neglects the ways society is imbued with a commitment to criminalizing unwanted bodies. In this empirical article I foreground a spatial analysis, making connections to the socio-spatial dialectic,…

  2. Pipeline oil fire detection with MODIS active fire products

    NASA Astrophysics Data System (ADS)

    Ogungbuyi, M. G.; Martinez, P.; Eckardt, F. D.

    2017-12-01

    We investigate 85 129 MODIS satellite active fire events from 2007 to 2015 in the Niger Delta of Nigeria. The region is the oil base for Nigerian economy and the hub of oil exploration where oil facilities (i.e. flowlines, flow stations, trunklines, oil wells and oil fields) are domiciled, and from where crude oil and refined products are transported to different Nigerian locations through a network of pipeline systems. Pipeline and other oil facilities are consistently susceptible to oil leaks due to operational or maintenance error, and by acts of deliberate sabotage of the pipeline equipment which often result in explosions and fire outbreaks. We used ground oil spill reports obtained from the National Oil Spill Detection and Response Agency (NOSDRA) database (see www.oilspillmonitor.ng) to validate MODIS satellite data. NOSDRA database shows an estimate of 10 000 spill events from 2007 - 2015. The spill events were filtered to include largest spills by volume and events occurring only in the Niger Delta (i.e. 386 spills). By projecting both MODIS fire and spill as `input vector' layers with `Points' geometry, and the Nigerian pipeline networks as `from vector' layers with `LineString' geometry in a geographical information system, we extracted the nearest MODIS events (i.e. 2192) closed to the pipelines by 1000m distance in spatial vector analysis. The extraction process that defined the nearest distance to the pipelines is based on the global practices of the Right of Way (ROW) in pipeline management that earmarked 30m strip of land to the pipeline. The KML files of the extracted fires in a Google map validated their source origin to be from oil facilities. Land cover mapping confirmed fire anomalies. The aim of the study is to propose a near-real-time monitoring of spill events along pipeline routes using 250 m spatial resolution of MODIS active fire detection sensor when such spills are accompanied by fire events in the study location.

  3. Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus.

    PubMed

    Hansen, Peter; Hecht, Jochen; Ibn-Salem, Jonas; Menkuec, Benjamin S; Roskosch, Sebastian; Truss, Matthias; Robinson, Peter N

    2016-11-04

    ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/ .

  4. Mapping of Brain Activity by Automated Volume Analysis of Immediate Early Genes.

    PubMed

    Renier, Nicolas; Adams, Eliza L; Kirst, Christoph; Wu, Zhuhao; Azevedo, Ricardo; Kohl, Johannes; Autry, Anita E; Kadiri, Lolahon; Umadevi Venkataraju, Kannan; Zhou, Yu; Wang, Victoria X; Tang, Cheuk Y; Olsen, Olav; Dulac, Catherine; Osten, Pavel; Tessier-Lavigne, Marc

    2016-06-16

    Understanding how neural information is processed in physiological and pathological states would benefit from precise detection, localization, and quantification of the activity of all neurons across the entire brain, which has not, to date, been achieved in the mammalian brain. We introduce a pipeline for high-speed acquisition of brain activity at cellular resolution through profiling immediate early gene expression using immunostaining and light-sheet fluorescence imaging, followed by automated mapping and analysis of activity by an open-source software program we term ClearMap. We validate the pipeline first by analysis of brain regions activated in response to haloperidol. Next, we report new cortical regions downstream of whisker-evoked sensory processing during active exploration. Last, we combine activity mapping with axon tracing to uncover new brain regions differentially activated during parenting behavior. This pipeline is widely applicable to different experimental paradigms, including animal species for which transgenic activity reporters are not readily available. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

    PubMed Central

    2014-01-01

    Background Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. Results To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. Conclusions By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples. PMID:24475911

  6. Mapping of brain activity by automated volume analysis of immediate early genes

    PubMed Central

    Renier, Nicolas; Adams, Eliza L.; Kirst, Christoph; Wu, Zhuhao; Azevedo, Ricardo; Kohl, Johannes; Autry, Anita E.; Kadiri, Lolahon; Venkataraju, Kannan Umadevi; Zhou, Yu; Wang, Victoria X.; Tang, Cheuk Y.; Olsen, Olav; Dulac, Catherine; Osten, Pavel; Tessier-Lavigne, Marc

    2016-01-01

    Summary Understanding how neural information is processed in physiological and pathological states would benefit from precise detection, localization and quantification of the activity of all neurons across the entire brain, which has not to date been achieved in the mammalian brain. We introduce a pipeline for high speed acquisition of brain activity at cellular resolution through profiling immediate early gene expression using immunostaining and light-sheet fluorescence imaging, followed by automated mapping and analysis of activity by an open-source software program we term ClearMap. We validate the pipeline first by analysis of brain regions activated in response to Haloperidol. Next, we report new cortical regions downstream of whisker-evoked sensory processing during active exploration. Lastly, we combine activity mapping with axon tracing to uncover new brain regions differentially activated during parenting behavior. This pipeline is widely applicable to different experimental paradigms, including animal species for which transgenic activity reporters are not readily available. PMID:27238021

  7. Comparison of software packages for detecting differential expression in RNA-seq studies

    PubMed Central

    Seyednasrollah, Fatemeh; Laiho, Asta

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110

  8. Comparison of software packages for detecting differential expression in RNA-seq studies.

    PubMed

    Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.

  9. Feasibility study for wax deposition imaging in oil pipelines by PGNAA technique.

    PubMed

    Cheng, Can; Jia, Wenbao; Hei, Daqian; Wei, Zhiyong; Wang, Hongtao

    2017-10-01

    Wax deposition in pipelines is a crucial problem in the oil industry. A method based on the prompt gamma-ray neutron activation analysis technique was applied to reconstruct the image of wax deposition in oil pipelines. The 2.223MeV hydrogen capture gamma rays were used to reconstruct the wax deposition image. To validate the method, both MCNP simulation and experiments were performed for wax deposited with a maximum thickness of 20cm. The performance of the method was simulated using the MCNP code. The experiment was conducted with a 252 Cf neutron source and a LaBr 3 : Ce detector. A good correspondence between the simulations and the experiments was observed. The results obtained indicate that the present approach is efficient for wax deposition imaging in oil pipelines. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Development and application of an information-analytic system on the problem of flow accelerated corrosion of pipeline elements in the secondary coolant circuit of VVER-440-based power units at the Novovoronezh nuclear power plant

    NASA Astrophysics Data System (ADS)

    Tomarov, G. V.; Povarov, V. P.; Shipkov, A. A.; Gromov, A. F.; Kiselev, A. N.; Shepelev, S. V.; Galanin, A. V.

    2015-02-01

    Specific features relating to development of the information-analytical system on the problem of flow-accelerated corrosion of pipeline elements in the secondary coolant circuit of the VVER-440-based power units at the Novovoronezh nuclear power plant are considered. The results from a statistical analysis of data on the quantity, location, and operating conditions of the elements and preinserted segments of pipelines used in the condensate-feedwater and wet steam paths are presented. The principles of preparing and using the information-analytical system for determining the lifetime to reaching inadmissible wall thinning in elements of pipelines used in the secondary coolant circuit of the VVER-440-based power units at the Novovoronezh NPP are considered.

  11. Statistical method to compare massive parallel sequencing pipelines.

    PubMed

    Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

    2017-03-01

    Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.

  12. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  13. Pump-stopping water hammer simulation based on RELAP5

    NASA Astrophysics Data System (ADS)

    Yi, W. S.; Jiang, J.; Li, D. D.; Lan, G.; Zhao, Z.

    2013-12-01

    RELAP5 was originally designed to analyze complex thermal-hydraulic interactions that occur during either postulated large or small loss-of-coolant accidents in PWRs. However, as development continued, the code was expanded to include many of the transient scenarios that might occur in thermal-hydraulic systems. The fast deceleration of the liquid results in high pressure surges, thus the kinetic energy is transformed into the potential energy, which leads to the temporary pressure increase. This phenomenon is called water hammer. Generally water hammer can occur in any thermal-hydraulic systems and it is extremely dangerous for the system when the pressure surges become considerably high. If this happens and when the pressure exceeds the critical pressure that the pipe or the fittings along the pipeline can burden, it will result in the failure of the whole pipeline integrity. The purpose of this article is to introduce the RELAP5 to the simulation and analysis of water hammer situations. Based on the knowledge of the RELAP5 code manuals and some relative documents, the authors utilize RELAP5 to set up an example of water-supply system via an impeller pump to simulate the phenomena of the pump-stopping water hammer. By the simulation of the sample case and the subsequent analysis of the results that the code has provided, we can have a better understand of the knowledge of water hammer as well as the quality of the RELAP5 code when it's used in the water-hammer fields. In the meantime, By comparing the results of the RELAP5 based model with that of other fluid-transient analysis software say, PIPENET. The authors make some conclusions about the peculiarity of RELAP5 when transplanted into water-hammer research and offer several modelling tips when use the code to simulate a water-hammer related case.

  14. A Play on Words: Using Cognitive Computing as a Basis for AI Solvers in Word Puzzles

    NASA Astrophysics Data System (ADS)

    Manzini, Thomas; Ellis, Simon; Hendler, James

    2015-12-01

    In this paper we offer a model, drawing inspiration from human cognition and based upon the pipeline developed for IBM's Watson, which solves clues in a type of word puzzle called syllacrostics. We briefly discuss its situation with respect to the greater field of artificial general intelligence (AGI) and how this process and model might be applied to other types of word puzzles. We present an overview of a system that has been developed to solve syllacrostics.

  15. Massive stereo-based DTM production for Mars on cloud computers

    NASA Astrophysics Data System (ADS)

    Tao, Y.; Muller, J.-P.; Sidiropoulos, P.; Xiong, Si-Ting; Putri, A. R. D.; Walter, S. H. G.; Veitch-Michaelis, J.; Yershov, V.

    2018-05-01

    Digital Terrain Model (DTM) creation is essential to improving our understanding of the formation processes of the Martian surface. Although there have been previous demonstrations of open-source or commercial planetary 3D reconstruction software, planetary scientists are still struggling with creating good quality DTMs that meet their science needs, especially when there is a requirement to produce a large number of high quality DTMs using "free" software. In this paper, we describe a new open source system to overcome many of these obstacles by demonstrating results in the context of issues found from experience with several planetary DTM pipelines. We introduce a new fully automated multi-resolution DTM processing chain for NASA Mars Reconnaissance Orbiter (MRO) Context Camera (CTX) and High Resolution Imaging Science Experiment (HiRISE) stereo processing, called the Co-registration Ames Stereo Pipeline (ASP) Gotcha Optimised (CASP-GO), based on the open source NASA ASP. CASP-GO employs tie-point based multi-resolution image co-registration, and Gotcha sub-pixel refinement and densification. CASP-GO pipeline is used to produce planet-wide CTX and HiRISE DTMs that guarantee global geo-referencing compliance with respect to High Resolution Stereo Colour imaging (HRSC), and thence to the Mars Orbiter Laser Altimeter (MOLA); providing refined stereo matching completeness and accuracy. All software and good quality products introduced in this paper are being made open-source to the planetary science community through collaboration with NASA Ames, United States Geological Survey (USGS) and the Jet Propulsion Laboratory (JPL), Advanced Multi-Mission Operations System (AMMOS) Planetary Data System (PDS) Pipeline Service (APPS-PDS4), as well as browseable and visualisable through the iMars web based Geographic Information System (webGIS) system.

  16. Simultaneous Analysis and Quality Assurance for Diffusion Tensor Imaging

    PubMed Central

    Lauzon, Carolyn B.; Asman, Andrew J.; Esparza, Michael L.; Burns, Scott S.; Fan, Qiuyun; Gao, Yurui; Anderson, Adam W.; Davis, Nicole; Cutting, Laurie E.; Landman, Bennett A.

    2013-01-01

    Diffusion tensor imaging (DTI) enables non-invasive, cyto-architectural mapping of in vivo tissue microarchitecture through voxel-wise mathematical modeling of multiple magnetic resonance imaging (MRI) acquisitions, each differently sensitized to water diffusion. DTI computations are fundamentally estimation processes and are sensitive to noise and artifacts. Despite widespread adoption in the neuroimaging community, maintaining consistent DTI data quality remains challenging given the propensity for patient motion, artifacts associated with fast imaging techniques, and the possibility of hardware changes/failures. Furthermore, the quantity of data acquired per voxel, the non-linear estimation process, and numerous potential use cases complicate traditional visual data inspection approaches. Currently, quality inspection of DTI data has relied on visual inspection and individual processing in DTI analysis software programs (e.g. DTIPrep, DTI-studio). However, recent advances in applied statistical methods have yielded several different metrics to assess noise level, artifact propensity, quality of tensor fit, variance of estimated measures, and bias in estimated measures. To date, these metrics have been largely studied in isolation. Herein, we select complementary metrics for integration into an automatic DTI analysis and quality assurance pipeline. The pipeline completes in 24 hours, stores statistical outputs, and produces a graphical summary quality analysis (QA) report. We assess the utility of this streamlined approach for empirical quality assessment on 608 DTI datasets from pediatric neuroimaging studies. The efficiency and accuracy of quality analysis using the proposed pipeline is compared with quality analysis based on visual inspection. The unified pipeline is found to save a statistically significant amount of time (over 70%) while improving the consistency of QA between a DTI expert and a pool of research associates. Projection of QA metrics to a low dimensional manifold reveal qualitative, but clear, QA-study associations and suggest that automated outlier/anomaly detection would be feasible. PMID:23637895

  17. Multi-scale imaging and informatics pipeline for in situ pluripotent stem cell analysis.

    PubMed

    Gorman, Bryan R; Lu, Junjie; Baccei, Anna; Lowry, Nathan C; Purvis, Jeremy E; Mangoubi, Rami S; Lerou, Paul H

    2014-01-01

    Human pluripotent stem (hPS) cells are a potential source of cells for medical therapy and an ideal system to study fate decisions in early development. However, hPS cells cultured in vitro exhibit a high degree of heterogeneity, presenting an obstacle to clinical translation. hPS cells grow in spatially patterned colony structures, necessitating quantitative single-cell image analysis. We offer a tool for analyzing the spatial population context of hPS cells that integrates automated fluorescent microscopy with an analysis pipeline. It enables high-throughput detection of colonies at low resolution, with single-cellular and sub-cellular analysis at high resolutions, generating seamless in situ maps of single-cellular data organized by colony. We demonstrate the tool's utility by analyzing inter- and intra-colony heterogeneity of hPS cell cycle regulation and pluripotency marker expression. We measured the heterogeneity within individual colonies by analyzing cell cycle as a function of distance. Cells loosely associated with the outside of the colony are more likely to be in G1, reflecting a less pluripotent state, while cells within the first pluripotent layer are more likely to be in G2, possibly reflecting a G2/M block. Our multi-scale analysis tool groups colony regions into density classes, and cells belonging to those classes have distinct distributions of pluripotency markers and respond differently to DNA damage induction. Lastly, we demonstrate that our pipeline can robustly handle high-content, high-resolution single molecular mRNA FISH data by using novel image processing techniques. Overall, the imaging informatics pipeline presented offers a novel approach to the analysis of hPS cells that includes not only single cell features but also colony wide, and more generally, multi-scale spatial configuration.

  18. Fuzzy-based propagation of prior knowledge to improve large-scale image analysis pipelines

    PubMed Central

    Mikut, Ralf

    2017-01-01

    Many automatically analyzable scientific questions are well-posed and a variety of information about expected outcomes is available a priori. Although often neglected, this prior knowledge can be systematically exploited to make automated analysis operations sensitive to a desired phenomenon or to evaluate extracted content with respect to this prior knowledge. For instance, the performance of processing operators can be greatly enhanced by a more focused detection strategy and by direct information about the ambiguity inherent in the extracted data. We present a new concept that increases the result quality awareness of image analysis operators by estimating and distributing the degree of uncertainty involved in their output based on prior knowledge. This allows the use of simple processing operators that are suitable for analyzing large-scale spatiotemporal (3D+t) microscopy images without compromising result quality. On the foundation of fuzzy set theory, we transform available prior knowledge into a mathematical representation and extensively use it to enhance the result quality of various processing operators. These concepts are illustrated on a typical bioimage analysis pipeline comprised of seed point detection, segmentation, multiview fusion and tracking. The functionality of the proposed approach is further validated on a comprehensive simulated 3D+t benchmark data set that mimics embryonic development and on large-scale light-sheet microscopy data of a zebrafish embryo. The general concept introduced in this contribution represents a new approach to efficiently exploit prior knowledge to improve the result quality of image analysis pipelines. The generality of the concept makes it applicable to practically any field with processing strategies that are arranged as linear pipelines. The automated analysis of terabyte-scale microscopy data will especially benefit from sophisticated and efficient algorithms that enable a quantitative and fast readout. PMID:29095927

  19. Simultaneous analysis and quality assurance for diffusion tensor imaging.

    PubMed

    Lauzon, Carolyn B; Asman, Andrew J; Esparza, Michael L; Burns, Scott S; Fan, Qiuyun; Gao, Yurui; Anderson, Adam W; Davis, Nicole; Cutting, Laurie E; Landman, Bennett A

    2013-01-01

    Diffusion tensor imaging (DTI) enables non-invasive, cyto-architectural mapping of in vivo tissue microarchitecture through voxel-wise mathematical modeling of multiple magnetic resonance imaging (MRI) acquisitions, each differently sensitized to water diffusion. DTI computations are fundamentally estimation processes and are sensitive to noise and artifacts. Despite widespread adoption in the neuroimaging community, maintaining consistent DTI data quality remains challenging given the propensity for patient motion, artifacts associated with fast imaging techniques, and the possibility of hardware changes/failures. Furthermore, the quantity of data acquired per voxel, the non-linear estimation process, and numerous potential use cases complicate traditional visual data inspection approaches. Currently, quality inspection of DTI data has relied on visual inspection and individual processing in DTI analysis software programs (e.g. DTIPrep, DTI-studio). However, recent advances in applied statistical methods have yielded several different metrics to assess noise level, artifact propensity, quality of tensor fit, variance of estimated measures, and bias in estimated measures. To date, these metrics have been largely studied in isolation. Herein, we select complementary metrics for integration into an automatic DTI analysis and quality assurance pipeline. The pipeline completes in 24 hours, stores statistical outputs, and produces a graphical summary quality analysis (QA) report. We assess the utility of this streamlined approach for empirical quality assessment on 608 DTI datasets from pediatric neuroimaging studies. The efficiency and accuracy of quality analysis using the proposed pipeline is compared with quality analysis based on visual inspection. The unified pipeline is found to save a statistically significant amount of time (over 70%) while improving the consistency of QA between a DTI expert and a pool of research associates. Projection of QA metrics to a low dimensional manifold reveal qualitative, but clear, QA-study associations and suggest that automated outlier/anomaly detection would be feasible.

  20. Group Analysis in FieldTrip of Time-Frequency Responses: A Pipeline for Reproducibility at Every Step of Processing, Going From Individual Sensor Space Representations to an Across-Group Source Space Representation.

    PubMed

    Andersen, Lau M

    2018-01-01

    An important aim of an analysis pipeline for magnetoencephalographic (MEG) data is that it allows for the researcher spending maximal effort on making the statistical comparisons that will answer his or her questions. The example question being answered here is whether the so-called beta rebound differs between novel and repeated stimulations. Two analyses are presented: going from individual sensor space representations to, respectively, an across-group sensor space representation and an across-group source space representation. The data analyzed are neural responses to tactile stimulations of the right index finger in a group of 20 healthy participants acquired from an Elekta Neuromag System. The processing steps covered for the first analysis are MaxFiltering the raw data, defining, preprocessing and epoching the data, cleaning the data, finding and removing independent components related to eye blinks, eye movements and heart beats, calculating participants' individual evoked responses by averaging over epoched data and subsequently removing the average response from single epochs, calculating a time-frequency representation and baselining it with non-stimulation trials and finally calculating a grand average, an across-group sensor space representation. The second analysis starts from the grand average sensor space representation and after identification of the beta rebound the neural origin is imaged using beamformer source reconstruction. This analysis covers reading in co-registered magnetic resonance images, segmenting the data, creating a volume conductor, creating a forward model, cutting out MEG data of interest in the time and frequency domains, getting Fourier transforms and estimating source activity with a beamformer model where power is expressed relative to MEG data measured during periods of non-stimulation. Finally, morphing the source estimates onto a common template and performing group-level statistics on the data are covered. Functions for saving relevant figures in an automated and structured manner are also included. The protocol presented here can be applied to any research protocol where the emphasis is on source reconstruction of induced responses where the underlying sources are not coherent.

  1. System for corrosion monitoring in pipeline applying fuzzy logic mathematics

    NASA Astrophysics Data System (ADS)

    Kuzyakov, O. N.; Kolosova, A. L.; Andreeva, M. A.

    2018-05-01

    A list of factors influencing corrosion rate on the external side of underground pipeline is determined. Principles of constructing a corrosion monitoring system are described; the system performance algorithm and program are elaborated. A comparative analysis of methods for calculating corrosion rate is undertaken. Fuzzy logic mathematics is applied to reduce calculations while considering a wider range of corrosion factors.

  2. To Math or Not to Math: The Algebra-Calculus Pipeline and Postsecondary Mathematics Remediation

    ERIC Educational Resources Information Center

    Showalter, Daniel A.

    2017-01-01

    This article reports on a study designed to estimate the effect of high school coursetaking in the algebra-calculus pipeline on the likelihood of placing out of postsecondary remedial mathematics. A nonparametric variant of propensity score analysis was used on a nationally representative data set to remove selection bias and test for an effect…

  3. Approaches to automatic parameter fitting in a microscopy image segmentation pipeline: An exploratory parameter space analysis.

    PubMed

    Held, Christian; Nattkemper, Tim; Palmisano, Ralf; Wittenberg, Thomas

    2013-01-01

    Research and diagnosis in medicine and biology often require the assessment of a large amount of microscopy image data. Although on the one hand, digital pathology and new bioimaging technologies find their way into clinical practice and pharmaceutical research, some general methodological issues in automated image analysis are still open. In this study, we address the problem of fitting the parameters in a microscopy image segmentation pipeline. We propose to fit the parameters of the pipeline's modules with optimization algorithms, such as, genetic algorithms or coordinate descents, and show how visual exploration of the parameter space can help to identify sub-optimal parameter settings that need to be avoided. This is of significant help in the design of our automatic parameter fitting framework, which enables us to tune the pipeline for large sets of micrographs. The underlying parameter spaces pose a challenge for manual as well as automated parameter optimization, as the parameter spaces can show several local performance maxima. Hence, optimization strategies that are not able to jump out of local performance maxima, like the hill climbing algorithm, often result in a local maximum.

  4. Approaches to automatic parameter fitting in a microscopy image segmentation pipeline: An exploratory parameter space analysis

    PubMed Central

    Held, Christian; Nattkemper, Tim; Palmisano, Ralf; Wittenberg, Thomas

    2013-01-01

    Introduction: Research and diagnosis in medicine and biology often require the assessment of a large amount of microscopy image data. Although on the one hand, digital pathology and new bioimaging technologies find their way into clinical practice and pharmaceutical research, some general methodological issues in automated image analysis are still open. Methods: In this study, we address the problem of fitting the parameters in a microscopy image segmentation pipeline. We propose to fit the parameters of the pipeline's modules with optimization algorithms, such as, genetic algorithms or coordinate descents, and show how visual exploration of the parameter space can help to identify sub-optimal parameter settings that need to be avoided. Results: This is of significant help in the design of our automatic parameter fitting framework, which enables us to tune the pipeline for large sets of micrographs. Conclusion: The underlying parameter spaces pose a challenge for manual as well as automated parameter optimization, as the parameter spaces can show several local performance maxima. Hence, optimization strategies that are not able to jump out of local performance maxima, like the hill climbing algorithm, often result in a local maximum. PMID:23766941

  5. Text processing through Web services: calling Whatizit.

    PubMed

    Rebholz-Schuhmann, Dietrich; Arregui, Miguel; Gaudan, Sylvain; Kirsch, Harald; Jimeno, Antonio

    2008-01-15

    Text-mining (TM) solutions are developing into efficient services to researchers in the biomedical research community. Such solutions have to scale with the growing number and size of resources (e.g. available controlled vocabularies), with the amount of literature to be processed (e.g. about 17 million documents in PubMed) and with the demands of the user community (e.g. different methods for fact extraction). These demands motivated the development of a server-based solution for literature analysis. Whatizit is a suite of modules that analyse text for contained information, e.g. any scientific publication or Medline abstracts. Special modules identify terms and then link them to the corresponding entries in bioinformatics databases such as UniProtKb/Swiss-Prot data entries and gene ontology concepts. Other modules identify a set of selected annotation types like the set produced by the EBIMed analysis pipeline for proteins. In the case of Medline abstracts, Whatizit offers access to EBI's in-house installation via PMID or term query. For large quantities of the user's own text, the server can be operated in a streaming mode (http://www.ebi.ac.uk/webservices/whatizit).

  6. Astrometry with A-Track Using Gaia DR1 Catalogue

    NASA Astrophysics Data System (ADS)

    Kılıç, Yücel; Erece, Orhan; Kaplan, Murat

    2018-04-01

    In this work, we built all sky index files from Gaia DR1 catalogue for the high-precision astrometric field solution and the precise WCS coordinates of the moving objects. For this, we used build-astrometry-index program as a part of astrometry.net code suit. Additionally, we added astrometry.net's WCS solution tool to our previously developed software which is a fast and robust pipeline for detecting moving objects such as asteroids and comets in sequential FITS images, called A-Track. Moreover, MPC module was added to A-Track. This module is linked to an asteroid database to name the found objects and prepare the MPC file to report the results. After these innovations, we tested a new version of the A-Track code on photometrical data taken by the SI-1100 CCD with 1-meter telescope at TÜBİTAK National Observatory, Antalya. The pipeline can be used to analyse large data archives or daily sequential data. The code is hosted on GitHub under the GNU GPL v3 license.

  7. Chinese-American headway on some environmental issues

    NASA Astrophysics Data System (ADS)

    Showstack, Randy

    Although Chinese Premier Zhu Rongji may have failed to gain entrance for his country into the World Trade Organization during his April visit to the United States, the two countries concluded a series of agreements as part of the Second Session of the 2-year-old U.S.-China Policy Forum on Environment and Development.A memorandum of understanding on a $100 million clean energy program accelerates the export of clean U.S. environmental technologies in the area of energy efficiency renewable energy, and pollution reduction. A statement of intent on the development of a Sulfur Dioxide (SO2) Emissions Trading Feasibility Study calls for China to develop a study to test the effectiveness of emissions trading in China as a market-based approach to reducing greenhouse gas emissions. And a Memorandum of Understanding on a natural gas pipeline project, signed by the Enron Corporation and the China National Petroleum Corporation, opens the way to jointly developing a natural gas pipeline to help offer an alternative to fossil fuels.

  8. Riding the Banzai Pipeline at Jupiter: Balancing Low Delta-V and Low Radiation to Reach Europa

    NASA Technical Reports Server (NTRS)

    McElrath, Timothy P.; Campagnola, Stefano; Strange, Nathan J.

    2012-01-01

    Europa's tantalizing allure as a possible haven for life comes cloaked in a myriad of challenges for robotic spacecraft exploration. Not only are the propulsive requirements high and the solar illumination low, but the radiation environment at Jupiter administers its inexorable death sentence on any electronics dispatched to closely examine the satellite. So to the usual trades of mass, delta-V, and cost, we must add radiation dose, which tugs the trajectory solution in a contrary direction. Previous studies have concluded that adding radiation shielding mass is more efficient than using ?V to reduce the exposure time, but that position was recently challenged by a study focusing on delivering simple landers to the Europa surface. During this work, a new trajectory option was found to occupy a strategic location in the delta-V/radiation continuum - we call it the "Banzai pipeline" due to the visual similarity with the end-on view down a breaking wave, as shown in the following figures.

  9. Weight-of-evidence environmental risk assessment of dumped chemical weapons after WWII along the Nord-Stream gas pipeline in the Bornholm Deep.

    PubMed

    Sanderson, Hans; Fauser, Patrik; Thomsen, Marianne; Larsen, Jørn Bo

    2012-05-15

    In connection with installation of two natural gas pipelines through the Baltic Sea between Russia and Germany, there has been concern regarding potential re-suspension of historically dumped chemical warfare agents (CWA) in a nearby dump site and the potential environmental risks associated. 192 sediment and 11 porewater samples were analyzed for CWA residues, both parent and metabolites in 2008 and 2010 along the pipeline corridor next to the dump site. Macrozoobenthos and background variables were also collected and compared to the observed CWA levels and predicted potential risks. Detection frequencies and levels of intact CWA found were low, whereas CWA metabolites were more frequently found. Re-suspension of CWA residue-containing sediment from installation of the pipelines contributes marginally to the overall background CWA residue exposure and risk along the pipeline route. The multivariate weight-of-evidence analysis showed that physical and background parameters of the sediment were of higher importance for the biota than observed CWA levels. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics

    PubMed Central

    Deutsch, Eric W.; Mendoza, Luis; Shteynberg, David; Slagel, Joseph; Sun, Zhi; Moritz, Robert L.

    2015-01-01

    Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include mass spectrometry to define protein sequence, protein:protein interactions, and protein post-translational modifications. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative mass spectrometry proteomics. It supports all major operating systems and instrument vendors via open data formats. Here we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of tandem mass spectrometry datasets, as well as some major upcoming features. PMID:25631240

  11. Using simulated fluorescence cell micrographs for the evaluation of cell image segmentation algorithms.

    PubMed

    Wiesmann, Veit; Bergler, Matthias; Palmisano, Ralf; Prinzen, Martin; Franz, Daniela; Wittenberg, Thomas

    2017-03-18

    Manual assessment and evaluation of fluorescent micrograph cell experiments is time-consuming and tedious. Automated segmentation pipelines can ensure efficient and reproducible evaluation and analysis with constant high quality for all images of an experiment. Such cell segmentation approaches are usually validated and rated in comparison to manually annotated micrographs. Nevertheless, manual annotations are prone to errors and display inter- and intra-observer variability which influence the validation results of automated cell segmentation pipelines. We present a new approach to simulate fluorescent cell micrographs that provides an objective ground truth for the validation of cell segmentation methods. The cell simulation was evaluated twofold: (1) An expert observer study shows that the proposed approach generates realistic fluorescent cell micrograph simulations. (2) An automated segmentation pipeline on the simulated fluorescent cell micrographs reproduces segmentation performances of that pipeline on real fluorescent cell micrographs. The proposed simulation approach produces realistic fluorescent cell micrographs with corresponding ground truth. The simulated data is suited to evaluate image segmentation pipelines more efficiently and reproducibly than it is possible on manually annotated real micrographs.

  12. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics.

    PubMed

    Deutsch, Eric W; Mendoza, Luis; Shteynberg, David; Slagel, Joseph; Sun, Zhi; Moritz, Robert L

    2015-08-01

    Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include MS to define protein sequence, protein:protein interactions, and protein PTMs. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative MS proteomics. It supports all major operating systems and instrument vendors via open data formats. Here, we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of MS/MS datasets, as well as some major upcoming features. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. A Pipeline for 3D Digital Optical Phenotyping Plant Root System Architecture

    NASA Astrophysics Data System (ADS)

    Davis, T. W.; Shaw, N. M.; Schneider, D. J.; Shaff, J. E.; Larson, B. G.; Craft, E. J.; Liu, Z.; Kochian, L. V.; Piñeros, M. A.

    2017-12-01

    This work presents a new pipeline for digital optical phenotyping the root system architecture of agricultural crops. The pipeline begins with a 3D root-system imaging apparatus for hydroponically grown crop lines of interest. The apparatus acts as a self-containing dark room, which includes an imaging tank, motorized rotating bearing and digital camera. The pipeline continues with the Plant Root Imaging and Data Acquisition (PRIDA) software, which is responsible for image capturing and storage. Once root images have been captured, image post-processing is performed using the Plant Root Imaging Analysis (PRIA) command-line tool, which extracts root pixels from color images. Following the pre-processing binarization of digital root images, 3D trait characterization is performed using the next-generation RootReader3D software. RootReader3D measures global root system architecture traits, such as total root system volume and length, total number of roots, and maximum rooting depth and width. While designed to work together, the four stages of the phenotyping pipeline are modular and stand-alone, which provides flexibility and adaptability for various research endeavors.

  14. An evolutionary approach to the architecture of effective healthcare delivery systems.

    PubMed

    Towill, D R; Christopher, M

    2005-01-01

    Aims to show that material flow concepts developed and successfully applied to commercial products and services can form equally well the architectural infrastructure of effective healthcare delivery systems. The methodology is based on the "power of analogy" which demonstrates that healthcare pipelines may be classified via the Time-Space Matrix. A small number (circa 4) of substantially different healthcare delivery pipelines will cover the vast majority of patient needs and simultaneously create adequate added value from their perspective. The emphasis is firmly placed on total process mapping and analysis via established identification techniques. Healthcare delivery pipelines must be properly engineered and matched to life cycle phase if the service is to be effective. This small family of healthcare delivery pipelines needs to be designed via adherence to very specific-to-purpose principles. These vary from "lean production" through to "agile delivery". The proposition for a strategic approach to healthcare delivery pipeline design is novel and positions much currently isolated research into a comprehensive organisational framework. It therefore provides a synthesis of the needs of global healthcare.

  15. Applicability of interferometric SAR technology to ground movement and pipeline monitoring

    NASA Astrophysics Data System (ADS)

    Grivas, Dimitri A.; Bhagvati, Chakravarthy; Schultz, B. C.; Trigg, Alan; Rizkalla, Moness

    1998-03-01

    This paper summarizes the findings of a cooperative effort between NOVA Gas Transmission Ltd. (NGTL), the Italian Natural Gas Transmission Company (SNAM), and Arista International, Inc., to determine whether current remote sensing technologies can be utilized to monitor small-scale ground movements over vast geographical areas. This topic is of interest due to the potential for small ground movements to cause strain accumulation in buried pipeline facilities. Ground movements are difficult to monitor continuously, but their cumulative effect over time can have a significant impact on the safety of buried pipelines. Interferometric synthetic aperture radar (InSAR or SARI) is identified as the most promising technique of those considered. InSAR analysis involves combining multiple images from consecutive passes of a radar imaging platform. The resulting composite image can detect changes as small as 2.5 to 5.0 centimeters (based on current analysis methods and radar satellite data of 5 centimeter wavelength). Research currently in progress shows potential for measuring ground movements as small as a few millimeters. Data needed for InSAR analysis is currently commercially available from four satellites, and additional satellites are planned for launch in the near future. A major conclusion of the present study is that InSAR technology is potentially useful for pipeline integrity monitoring. A pilot project is planned to test operational issues.

  16. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  17. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  18. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  19. Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

    PubMed

    Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y

    2014-12-23

    Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

  20. Low-level processing for real-time image analysis

    NASA Technical Reports Server (NTRS)

    Eskenazi, R.; Wilf, J. M.

    1979-01-01

    A system that detects object outlines in television images in real time is described. A high-speed pipeline processor transforms the raw image into an edge map and a microprocessor, which is integrated into the system, clusters the edges, and represents them as chain codes. Image statistics, useful for higher level tasks such as pattern recognition, are computed by the microprocessor. Peak intensity and peak gradient values are extracted within a programmable window and are used for iris and focus control. The algorithms implemented in hardware and the pipeline processor architecture are described. The strategy for partitioning functions in the pipeline was chosen to make the implementation modular. The microprocessor interface allows flexible and adaptive control of the feature extraction process. The software algorithms for clustering edge segments, creating chain codes, and computing image statistics are also discussed. A strategy for real time image analysis that uses this system is given.

Top