Sample records for extensive bioinformatics analysis

  1. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  2. An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

    PubMed

    van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

    2009-06-01

    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.

  3. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  4. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  6. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

    PubMed Central

    Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl ES

    2009-01-01

    Background Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics. PMID:19732427

  7. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services.

    PubMed

    Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl E S

    2009-09-04

    Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

  8. Practical applications of the bioinformatics toolbox for narrowing quantitative trait loci.

    PubMed

    Burgess-Herbert, Sarah L; Cox, Allison; Tsaih, Shirng-Wern; Paigen, Beverly

    2008-12-01

    Dissecting the genes involved in complex traits can be confounded by multiple factors, including extensive epistatic interactions among genes, the involvement of epigenetic regulators, and the variable expressivity of traits. Although quantitative trait locus (QTL) analysis has been a powerful tool for localizing the chromosomal regions underlying complex traits, systematically identifying the causal genes remains challenging. Here, through its application to plasma levels of high-density lipoprotein cholesterol (HDL) in mice, we demonstrate a strategy for narrowing QTL that utilizes comparative genomics and bioinformatics techniques. We show how QTL detected in multiple crosses are subjected to both combined cross analysis and haplotype block analysis; how QTL from one species are mapped to the concordant regions in another species; and how genomewide scans associating haplotype groups with their phenotypes can be used to prioritize the narrowed regions. Then we illustrate how these individual methods for narrowing QTL can be systematically integrated for mouse chromosomes 12 and 15, resulting in a significantly reduced number of candidate genes, often from hundreds to <10. Finally, we give an example of how additional bioinformatics resources can be combined with experiments to determine the most likely quantitative trait genes.

  9. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  10. MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers.

    PubMed

    Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka

    2014-11-15

    The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  11. Bioconductor: open software development for computational biology and bioinformatics

    PubMed Central

    Gentleman, Robert C; Carey, Vincent J; Bates, Douglas M; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean YH; Zhang, Jianhua

    2004-01-01

    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples. PMID:15461798

  12. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  13. Glossary of bioinformatics terms.

    PubMed

    2007-06-01

    This collection of terms and definitions commonly encountered in the bioinformatics literature will be updated periodically as Current Protocols in Bioinformatics grows. In addition, an extensive glossary of genetic terms can be found on the Web site of the National Human Genome Research Institute (http://www.genome.gov/glossary.cfm). The entries in that online glossary provide a brief written definition of the term; the user can also listen to an informative explanation of the term using RealAudio or the Windows Media Player.

  14. Efficient population-scale variant analysis and prioritization with VAPr.

    PubMed

    Birmingham, Amanda; Mark, Adam M; Mazzaferro, Carlo; Xu, Guorong; Fisch, Kathleen M

    2018-04-06

    With the growing availability of population-scale whole-exome and whole-genome sequencing, demand for reproducible, scalable variant analysis has spread within genomic research communities. To address this need, we introduce the Python package VAPr (Variant Analysis and Prioritization). VAPr leverages existing annotation tools ANNOVAR and MyVariant.info with MongoDB-based flexible storage and filtering functionality. It offers biologists and bioinformatics generalists easy-to-use and scalable analysis and prioritization of genomic variants from large cohort studies. VAPr is developed in Python and is available for free use and extension under the MIT License. An install package is available on PyPi at https://pypi.python.org/pypi/VAPr, while source code and extensive documentation are on GitHub at https://github.com/ucsd-ccbb/VAPr. kfisch@ucsd.edu.

  15. Public data and open source tools for multi-assay genomic investigation of disease.

    PubMed

    Kannan, Lavanya; Ramos, Marcel; Re, Angela; El-Hachem, Nehme; Safikhani, Zhaleh; Gendoo, Deena M A; Davis, Sean; Gomez-Cabrero, David; Castelo, Robert; Hansen, Kasper D; Carey, Vincent J; Morgan, Martin; Culhane, Aedín C; Haibe-Kains, Benjamin; Waldron, Levi

    2016-07-01

    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods. © The Author 2015. Published by Oxford University Press.

  16. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-03

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.

  17. Identification of a Novel Rhabdovirus in Spodoptera frugiperda Cell Lines

    PubMed Central

    Ma, Hailun; Galvin, Teresa A.; Glasner, Dustin R.; Shaheduzzaman, Syed

    2014-01-01

    ABSTRACT The Sf9 cell line, derived from Spodoptera frugiperda, is used as a cell substrate for biological products, and no viruses have been reported in this cell line after extensive testing. We used degenerate PCR assays and massively parallel sequencing (MPS) to identify a novel RNA virus belonging to the order Mononegavirales in Sf9 cells. Sequence analysis of the assembled virus genome showed the presence of five open reading frames (ORFs) corresponding to the genes for the N, P, M, G, and L proteins in other rhabdoviruses and an unknown ORF of 111 amino acids located between the G- and L-protein genes. BLAST searches indicated that the S. frugiperda rhabdovirus (Sf-rhabdovirus) was related in a limited region of the L-protein gene to Taastrup virus, a newly discovered member of the Mononegavirales from a leafhopper (Hemiptera), and also to plant rhabdoviruses, particularly in the genus Cytorhabdovirus. Phylogenetic analysis of sequences in the L-protein gene indicated that Sf-rhabdovirus is a novel virus that branched with Taastrup virus. Rhabdovirus morphology was confirmed by transmission electron microscopy of filtered supernatant samples from Sf9 cells. Infectivity studies indicated potential transient infection by Sf-rhabdovirus in other insect cell lines, but there was no evidence of entry or virus replication in human cell lines. Sf-rhabdovirus sequences were also found in the Sf21 parental cell line of Sf9 cells but not in other insect cell lines, such as BT1-TN-5B1-4 (Tn5; High Five) cells and Schneider's Drosophila line 2 [D.Mel.(2); SL2] cells, indicating a species-specific infection. The results indicate that conventional methods may be complemented by state-of-the-art technologies with extensive bioinformatics analysis for identification of novel viruses. IMPORTANCE The Spodoptera frugiperda Sf9 cell line is used as a cell substrate for the development and manufacture of biological products. Extensive testing has not previously identified any viruses in this cell line. This paper reports on the identification and characterization of a novel rhabdovirus in Sf9 cells. This was accomplished through the use of next-generation sequencing platforms, de novo assembly tools, and extensive bioinformatics analysis. Rhabdovirus identification was further confirmed by transmission electron microscopy. Infectivity studies showed the lack of replication of Sf-rhabdovirus in human cell lines. The overall study highlights the use of a combinatorial testing approach including conventional methods and new technologies for evaluation of cell lines for unexpected viruses and use of comprehensive bioinformatics strategies for obtaining confident next-generation sequencing results. PMID:24672045

  18. Improvement of Student Understanding of How Kinetic Data Facilitates the Determination of Amino Acid Catalytic Function through an Alkaline Phosphatase Structure/Mechanism Bioinformatics Exercise

    ERIC Educational Resources Information Center

    Grunwald, Sandra K.; Krueger, Katherine J.

    2008-01-01

    Laboratory exercises, which utilize alkaline phosphatase as a model enzyme, have been developed and used extensively in undergraduate biochemistry courses to illustrate enzyme steady-state kinetics. A bioinformatics laboratory exercise for the biochemistry laboratory, which complements the traditional alkaline phosphatase kinetics exercise, was…

  19. Educational websites--Bioinformatics Tools II.

    PubMed

    Lomberk, Gwen

    2009-01-01

    In this issue, the highlighted websites are a continuation of a series of educational websites; this one in particular from a couple of years ago, Bioinformatics Tools [Pancreatology 2005;5:314-315]. These include sites that are valuable resources for many research needs in genomics and proteomics. Bioinformatics has become a laboratory tool to map sequences to databases, develop models of molecular interactions, evaluate structural compatibilities, describe differences between normal and disease-associated DNA, identify conserved motifs within proteins, and chart extensive signaling networks, all in silico. Copyright 2008 S. Karger AG, Basel and IAP.

  20. COMAN: a web server for comprehensive metatranscriptomics analysis.

    PubMed

    Ni, Yueqiong; Li, Jun; Panagiotou, Gianni

    2016-08-11

    Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming. We developed COMAN (Comprehensive Metatranscriptomics Analysis), a web-based tool dedicated to automatically and comprehensively analysing metatranscriptomic data. COMAN pipeline includes quality control of raw reads, removal of reads derived from non-coding RNA, followed by functional annotation, comparative statistical analysis, pathway enrichment analysis, co-expression network analysis and high-quality visualisation. The essential data generated by COMAN are also provided in tabular format for additional analysis and integration with other software. The web server has an easy-to-use interface and detailed instructions, and is freely available at http://sbb.hku.hk/COMAN/ CONCLUSIONS: COMAN is an integrated web server dedicated to comprehensive functional analysis of metatranscriptomic data, translating massive amount of reads to data tables and high-standard figures. It is expected to facilitate the researchers with less expertise in bioinformatics in answering microbiota-related biological questions and to increase the accessibility and interpretation of microbiota RNA-Seq data.

  1. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

  2. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774

  3. Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

    PubMed Central

    Diaz Acosta, B.

    2011-01-01

    The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.

  4. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538

  5. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.

  6. Seahawk: moving beyond HTML in Web-based bioinformatics analysis.

    PubMed

    Gordon, Paul M K; Sensen, Christoph W

    2007-06-18

    Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therefore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.

  7. Seahawk: moving beyond HTML in Web-based bioinformatics analysis

    PubMed Central

    Gordon, Paul MK; Sensen, Christoph W

    2007-01-01

    Background Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therfore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. Results We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. Conclusion As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer. PMID:17577405

  8. Identification of a novel rhabdovirus in Spodoptera frugiperda cell lines.

    PubMed

    Ma, Hailun; Galvin, Teresa A; Glasner, Dustin R; Shaheduzzaman, Syed; Khan, Arifa S

    2014-06-01

    The Sf9 cell line, derived from Spodoptera frugiperda, is used as a cell substrate for biological products, and no viruses have been reported in this cell line after extensive testing. We used degenerate PCR assays and massively parallel sequencing (MPS) to identify a novel RNA virus belonging to the order Mononegavirales in Sf9 cells. Sequence analysis of the assembled virus genome showed the presence of five open reading frames (ORFs) corresponding to the genes for the N, P, M, G, and L proteins in other rhabdoviruses and an unknown ORF of 111 amino acids located between the G- and L-protein genes. BLAST searches indicated that the S. frugiperda rhabdovirus (Sf-rhabdovirus) was related in a limited region of the L-protein gene to Taastrup virus, a newly discovered member of the Mononegavirales from a leafhopper (Hemiptera), and also to plant rhabdoviruses, particularly in the genus Cytorhabdovirus. Phylogenetic analysis of sequences in the L-protein gene indicated that Sf-rhabdovirus is a novel virus that branched with Taastrup virus. Rhabdovirus morphology was confirmed by transmission electron microscopy of filtered supernatant samples from Sf9 cells. Infectivity studies indicated potential transient infection by Sf-rhabdovirus in other insect cell lines, but there was no evidence of entry or virus replication in human cell lines. Sf-rhabdovirus sequences were also found in the Sf21 parental cell line of Sf9 cells but not in other insect cell lines, such as BT1-TN-5B1-4 (Tn5; High Five) cells and Schneider's Drosophila line 2 [D.Mel.(2); SL2] cells, indicating a species-specific infection. The results indicate that conventional methods may be complemented by state-of-the-art technologies with extensive bioinformatics analysis for identification of novel viruses. The Spodoptera frugiperda Sf9 cell line is used as a cell substrate for the development and manufacture of biological products. Extensive testing has not previously identified any viruses in this cell line. This paper reports on the identification and characterization of a novel rhabdovirus in Sf9 cells. This was accomplished through the use of next-generation sequencing platforms, de novo assembly tools, and extensive bioinformatics analysis. Rhabdovirus identification was further confirmed by transmission electron microscopy. Infectivity studies showed the lack of replication of Sf-rhabdovirus in human cell lines. The overall study highlights the use of a combinatorial testing approach including conventional methods and new technologies for evaluation of cell lines for unexpected viruses and use of comprehensive bioinformatics strategies for obtaining confident next-generation sequencing results. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  9. Toward Personalized Pressure Ulcer Care Planning: Development of a Bioinformatics System for Individualized Prioritization of Clinical Pratice Guideline

    DTIC Science & Technology

    2016-10-01

    and text data mining . A Spinal Cord Injury Pressure Ulcer and Deep tissue injury ontology, SCIPUDO, will be developed to ensure robust and extensive...on natural language programming and the need to convert text in to data for analysis. In progress c) Define Physio-MIMI based SCIPUD+ Resource...information extraction from the free text clinical note. 3) Significant Results Nothing to report 4) Other Achievements Nothing to report

  10. Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

    PubMed

    Nair, Pradeep S; John, Eugene B

    2007-01-01

    Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.

  11. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  12. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  13. KMgene: a unified R package for gene-based association analysis for complex traits.

    PubMed

    Yan, Qi; Fang, Zhou; Chen, Wei; Stegle, Oliver

    2018-02-09

    In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. http://cran.r-project.org/web/packages/KMgene. qi.yan@chp.edu or wei.chen@chp.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  14. Scientific Workflow Management in Proteomics

    PubMed Central

    de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

    2012-01-01

    Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703

  15. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  16. jORCA: easily integrating bioinformatics Web Services.

    PubMed

    Martín-Requena, Victoria; Ríos, Javier; García, Maximiliano; Ramírez, Sergio; Trelles, Oswaldo

    2010-02-15

    Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various Web Services differ in their definition and invocation protocols, as well as their communication and data formats-and this presents a barrier to service interoperability. jORCA is a desktop client aimed at facilitating seamless integration of Web Services. It does so by making a uniform representation of the different web resources, supporting scalable service discovery, and automatic composition of workflows. Usability is at the top of the jORCA agenda; thus it is a highly customizable and extensible application that accommodates a broad range of user skills featuring double-click invocation of services in conjunction with advanced execution-control, on the fly data standardization, extensibility of viewer plug-ins, drag-and-drop editing capabilities, plus a file-based browsing style and organization of favourite tools. The integration of bioinformatics Web Services is made easier to support a wider range of users. .

  17. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  18. Bioinformatics data distribution and integration via Web Services and XML.

    PubMed

    Li, Xiao; Zhang, Yizheng

    2003-11-01

    It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biology data is not solved satisfactorily. The eXtensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration.

  19. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  20. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  1. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  2. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.

  3. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.

    PubMed

    Yao, Li; Wang, Heming; Song, Yuanyuan; Sui, Guangchao

    2017-10-15

    With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease. BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io. li_yao@outlook.com or gcsui@nefu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. The role of networks and artificial intelligence in nanotechnology design and analysis.

    PubMed

    Hudson, D L; Cohen, M E

    2004-05-01

    Techniques with their origins in artificial intelligence have had a great impact on many areas of biomedicine. Expert-based systems have been used to develop computer-assisted decision aids. Neural networks have been used extensively in disease classification and more recently in many bioinformatics applications including genomics and drug design. Network theory in general has proved useful in modeling all aspects of biomedicine from healthcare organizational structure to biochemical pathways. These methods show promise in applications involving nanotechnology both in the design phase and in interpretation of system functioning.

  5. ELISA-BASE: An Integrated Bioinformatics Tool for Analyzing and Tracking ELISA Microarray Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Amanda M.; Collett, James L.; Seurynck-Servoss, Shannon L.

    ELISA-BASE is an open-source database for capturing, organizing and analyzing protein enzyme-linked immunosorbent assay (ELISA) microarray data. ELISA-BASE is an extension of the BioArray Soft-ware Environment (BASE) database system, which was developed for DNA microarrays. In order to make BASE suitable for protein microarray experiments, we developed several plugins for importing and analyzing quantitative ELISA microarray data. Most notably, our Protein Microarray Analysis Tool (ProMAT) for processing quantita-tive ELISA data is now available as a plugin to the database.

  6. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades.

    PubMed

    Wren, Jonathan D

    2016-09-01

    To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  8. Bioinformatics/biostatistics: microarray analysis.

    PubMed

    Eichler, Gabriel S

    2012-01-01

    The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).

  9. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data

    PubMed Central

    Kearse, Matthew; Moir, Richard; Wilson, Amy; Stones-Havas, Steven; Cheung, Matthew; Sturrock, Shane; Buxton, Simon; Cooper, Alex; Markowitz, Sidney; Duran, Chris; Thierer, Tobias; Ashton, Bruce; Meintjes, Peter; Drummond, Alexei

    2012-01-01

    Summary: The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data. Availability and implementation: Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl. Contact: peter@biomatters.com PMID:22543367

  10. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

    PubMed

    Kearse, Matthew; Moir, Richard; Wilson, Amy; Stones-Havas, Steven; Cheung, Matthew; Sturrock, Shane; Buxton, Simon; Cooper, Alex; Markowitz, Sidney; Duran, Chris; Thierer, Tobias; Ashton, Bruce; Meintjes, Peter; Drummond, Alexei

    2012-06-15

    The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data. Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl.

  11. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    PubMed Central

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

    2018-01-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625

  12. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

    PubMed

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

    2018-02-27

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  13. Opportunities at the Intersection of Bioinformatics and Health Informatics

    PubMed Central

    Miller, Perry L.

    2000-01-01

    This paper provides a “viewpoint discussion” based on a presentation made to the 2000 Symposium of the American College of Medical Informatics. It discusses potential opportunities for researchers in health informatics to become involved in the rapidly growing field of bioinformatics, using the activities of the Yale Center for Medical Informatics as a case study. One set of opportunities occurs where bioinformatics research itself intersects with the clinical world. Examples include the correlations between individual genetic variation with clinical risk factors, disease presentation, and differential response to treatment; and the implications of including genetic test results in the patient record, which raises clinical decision support issues as well as legal and ethical issues. A second set of opportunities occurs where bioinformatics research can benefit from the technologic expertise and approaches that informaticians have used extensively in the clinical arena. Examples include database organization and knowledge representation, data mining, and modeling and simulation. Microarray technology is discussed as a specific potential area for collaboration. Related questions concern how best to establish collaborations with bioscientists so that the interests and needs of both sets of researchers can be met in a synergistic fashion, and the most appropriate home for bioinformatics in an academic medical center. PMID:10984461

  14. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  15. Rapid phylogenetic dissection of prokaryotic community structure in tidal flat using pyrosequencing.

    PubMed

    Kim, Bong-Soo; Kim, Byung Kwon; Lee, Jae-Hak; Kim, Myungjin; Lim, Young Woon; Chun, Jongsik

    2008-08-01

    Dissection of prokaryotic community structure is prerequisite to understand their ecological roles. Various methods are available for such a purpose which amplification and sequencing of 16S rRNA genes gained its popularity. However, conventional methods based on Sanger sequencing technique require cloning process prior to sequencing, and are expensive and labor-intensive. We investigated prokaryotic community structure in tidal flat sediments, Korea, using pyrosequencing and a subsequent automated bioinformatic pipeline for the rapid and accurate taxonomic assignment of each amplicon. The combination of pyrosequencing and bioinformatic analysis showed that bacterial and archaeal communities were more diverse than previously reported in clone library studies. Pyrosequencing analysis revealed 21 bacterial divisions and 37 candidate divisions. Proteobacteria was the most abundant division in the bacterial community, of which Gamma-and Delta-Proteobacteria were the most abundant. Similarly, 4 archaeal divisions were found in tidal flat sediments. Euryarchaeota was the most abundant division in the archaeal sequences, which were further divided into 8 classes and 11 unclassified euryarchaeota groups. The system developed here provides a simple, in-depth and automated way of dissecting a prokaryotic community structure without extensive pretreatment such as cloning.

  16. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  17. snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

    PubMed

    Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

    2012-01-01

    The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

  18. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  19. 5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK--Setting the analysis frame.

    PubMed

    Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf

    2005-09-01

    The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.

  20. ImageJ-MATLAB: a bidirectional framework for scientific image analysis interoperability.

    PubMed

    Hiner, Mark C; Rueden, Curtis T; Eliceiri, Kevin W

    2017-02-15

    ImageJ-MATLAB is a lightweight Java library facilitating bi-directional interoperability between MATLAB and ImageJ. By defining a standard for translation between matrix and image data structures, researchers are empowered to select the best tool for their image-analysis tasks. Freely available extension to ImageJ2 ( http://imagej.net/Downloads ). Installation and use instructions available at http://imagej.net/MATLAB_Scripting. Tested with ImageJ 2.0.0-rc-54 , Java 1.8.0_66 and MATLAB R2015b. eliceiri@wisc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

    PubMed Central

    Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769

  2. Making authentic science accessible-the benefits and challenges of integrating bioinformatics into a high-school science curriculum.

    PubMed

    Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.

  3. Computational method and system for modeling, analyzing, and optimizing DNA amplification and synthesis

    DOEpatents

    Vandersall, Jennifer A.; Gardner, Shea N.; Clague, David S.

    2010-05-04

    A computational method and computer-based system of modeling DNA synthesis for the design and interpretation of PCR amplification, parallel DNA synthesis, and microarray chip analysis. The method and system include modules that address the bioinformatics, kinetics, and thermodynamics of DNA amplification and synthesis. Specifically, the steps of DNA selection, as well as the kinetics and thermodynamics of DNA hybridization and extensions, are addressed, which enable the optimization of the processing and the prediction of the products as a function of DNA sequence, mixing protocol, time, temperature and concentration of species.

  4. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sundstrom, J.; Tash, B; Murakami, T

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  5. Systematic Characterization and Comparative Analysis of the Rabbit Immunoglobulin Repertoire

    PubMed Central

    Lavinder, Jason J.; Hoi, Kam Hon; Reddy, Sai T.; Wine, Yariv; Georgiou, George

    2014-01-01

    Rabbits have been used extensively as a model system for the elucidation of the mechanism of immunoglobulin diversification and for the production of antibodies. We employed Next Generation Sequencing to analyze Ig germline V and J gene usage, CDR3 length and amino acid composition, and gene conversion frequencies within the functional (transcribed) IgG repertoire of the New Zealand white rabbit (Oryctolagus cuniculus). Several previously unannotated rabbit heavy chain variable (VH) and light chain variable (VL) germline elements were deduced bioinformatically using multidimensional scaling and k-means clustering methods. We estimated the gene conversion frequency in the rabbit at 23% of IgG sequences with a mean gene conversion tract length of 59±36 bp. Sequencing and gene conversion analysis of the chicken, human, and mouse repertoires revealed that gene conversion occurs much more extensively in the chicken (frequency 70%, tract length 79±57 bp), was observed to a small, yet statistically significant extent in humans, but was virtually absent in mice. PMID:24978027

  6. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  7. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  8. Development of Bioinformatics Infrastructure for Genomics Research.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Adebiyi, Marion; Adeyemi, Seun; Ahmed, Azza; Ahmed, Rehab; Akanle, Bola; Alibi, Mohamed; Armstrong, Don L; Aron, Shaun; Ashano, Efejiro; Baichoo, Shakuntala; Benkahla, Alia; Brown, David K; Chimusa, Emile R; Fadlelmola, Faisal M; Falola, Dare; Fatumo, Segun; Ghedira, Kais; Ghouila, Amel; Hazelhurst, Scott; Isewon, Itunuoluwa; Jung, Segun; Kassim, Samar Kamal; Kayondo, Jonathan K; Mbiyavanga, Mamana; Meintjes, Ayton; Mohammed, Somia; Mosaku, Abayomi; Moussa, Ahmed; Muhammd, Mustafa; Mungloo-Dilmohamud, Zahra; Nashiru, Oyekanmi; Odia, Trust; Okafor, Adaobi; Oladipo, Olaleye; Osamor, Victor; Oyelade, Jellili; Sadki, Khalid; Salifu, Samson Pandam; Soyemi, Jumoke; Panji, Sumir; Radouani, Fouzia; Souiai, Oussama; Tastan Bishop, Özlem

    2017-06-01

    Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa. Copyright © 2017 World Heart Federation (Geneva). Published by Elsevier B.V. All rights reserved.

  9. Bioinformatics training: selecting an appropriate learning content management system--an example from the European Bioinformatics Institute.

    PubMed

    Wright, Victoria Ann; Vaughan, Brendan W; Laurent, Thomas; Lopez, Rodrigo; Brooksbank, Cath; Schneider, Maria Victoria

    2010-11-01

    Today's molecular life scientists are well educated in the emerging experimental tools of their trade, but when it comes to training on the myriad of resources and tools for dealing with biological data, a less ideal situation emerges. Often bioinformatics users receive no formal training on how to make the most of the bioinformatics resources and tools available in the public domain. The European Bioinformatics Institute, which is part of the European Molecular Biology Laboratory (EMBL-EBI), holds the world's most comprehensive collection of molecular data, and training the research community to exploit this information is embedded in the EBI's mission. We have evaluated eLearning, in parallel with face-to-face courses, as a means of training users of our data resources and tools. We anticipate that eLearning will become an increasingly important vehicle for delivering training to our growing user base, so we have undertaken an extensive review of Learning Content Management Systems (LCMSs). Here, we describe the process that we used, which considered the requirements of trainees, trainers and systems administrators, as well as taking into account our organizational values and needs. This review describes the literature survey, user discussions and scripted platform testing that we performed to narrow down our choice of platform from 36 to a single platform. We hope that it will serve as guidance for others who are seeking to incorporate eLearning into their bioinformatics training programmes.

  10. Taking Bioinformatics to Systems Medicine.

    PubMed

    van Kampen, Antoine H C; Moerland, Perry D

    2016-01-01

    Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.

  11. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  12. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  13. BioMake: a GNU make-compatible utility for declarative workflow management.

    PubMed

    Holmes, Ian H; Mungall, Christopher J

    2017-11-01

    The Unix 'make' program is widely used in bioinformatics pipelines, but suffers from problems that limit its application to large analysis datasets. These include reliance on file modification times to determine whether a target is stale, lack of support for parallel execution on clusters, and restricted flexibility to extend the underlying logic program. We present BioMake, a make-like utility that is compatible with most features of GNU Make and adds support for popular cluster-based job-queue engines, MD5 signatures as an alternative to timestamps, and logic programming extensions in Prolog. BioMake is available for MacOSX and Linux systems from https://github.com/evoldoers/biomake under the BSD3 license. The only dependency is SWI-Prolog (version 7), available from http://www.swi-prolog.org/. ihholmes + biomake@gmail.com or cmungall + biomake@gmail.com. Feature table comparing BioMake to similar tools. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. StrBioLib: a Java library for development of custom computational structural biology applications.

    PubMed

    Chandonia, John-Marc

    2007-08-01

    StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/

  15. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  16. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  17. Edge Bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  18. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.

    PubMed

    Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2016-01-01

    The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.

  19. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  20. Green Fluorescent Protein-Focused Bioinformatics Laboratory Experiment Suitable for Undergraduates in Biochemistry Courses

    ERIC Educational Resources Information Center

    Rowe, Laura

    2017-01-01

    An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…

  1. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  2. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

    PubMed

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.

  3. Data mining in bioinformatics using Weka.

    PubMed

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  4. Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides.

    PubMed

    Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C

    2018-06-01

    There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.

  5. Teaching bioinformatics and neuroinformatics by using free web-based tools.

    PubMed

    Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.

  6. Analysis of requirements for teaching materials based on the course bioinformatics for plant metabolism

    NASA Astrophysics Data System (ADS)

    Balqis, Widodo, Lukiati, Betty; Amin, Mohamad

    2017-05-01

    A way to improve the quality of learning in the course of Plant Metabolism in the Department of Biology, State University of Malang, is to develop teaching materials. This research evaluates the needs of bioinformatics-based teaching material in the course Plant Metabolism by the Analyze, Design, Develop, Implement, and Evaluate (ADDIE) development model. Data were collected through questionnaires distributed to the students in the Plant Metabolism course of the Department of Biology, University of Malang, and analysis of the plan of lectures semester (RPS). Learning gains of this course show that it is not yet integrated into the field of bioinformatics. All respondents stated that plant metabolism books do not include bioinformatics and fail to explain the metabolism of a chemical compound of a local plant in Indonesia. Respondents thought that bioinformatics can explain examples and metabolism of a secondary metabolite analysis techniques and discuss potential medicinal compounds from local plants. As many as 65% of the respondents said that the existing metabolism book could not be used to understand secondary metabolism in lectures of plant metabolism. Therefore, the development of teaching materials including plant metabolism-based bioinformatics is important to improve the understanding of the lecture material in plant metabolism.

  7. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  8. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

    PubMed

    Taylor, Ronald C

    2010-12-21

    Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.

  9. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  10. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  11. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  12. ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging

    PubMed Central

    Ovesný, Martin; Křížek, Pavel; Borkovec, Josef; Švindrych, Zdeněk; Hagen, Guy M.

    2014-01-01

    Summary: ThunderSTORM is an open-source, interactive and modular plug-in for ImageJ designed for automated processing, analysis and visualization of data acquired by single-molecule localization microscopy methods such as photo-activated localization microscopy and stochastic optical reconstruction microscopy. ThunderSTORM offers an extensive collection of processing and post-processing methods so that users can easily adapt the process of analysis to their data. ThunderSTORM also offers a set of tools for creation of simulated data and quantitative performance evaluation of localization algorithms using Monte Carlo simulations. Availability and implementation: ThunderSTORM and the online documentation are both freely accessible at https://code.google.com/p/thunder-storm/ Contact: guy.hagen@lf1.cuni.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24771516

  13. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  15. An integrated bioinformatics analysis to dissect kinase dependency in triple negative breast cancer.

    PubMed

    Ryall, Karen A; Kim, Jihye; Klauck, Peter J; Shin, Jimin; Yoo, Minjae; Ionkina, Anastasia; Pitts, Todd M; Tentler, John J; Diamond, Jennifer R; Eckhardt, S Gail; Heasley, Lynn E; Kang, Jaewoo; Tan, Aik Choon

    2015-01-01

    Triple-Negative Breast Cancer (TNBC) is an aggressive disease with a poor prognosis. Clinically, TNBC patients have limited treatment options besides chemotherapy. The goal of this study was to determine the kinase dependency in TNBC cell lines and to predict compounds that could inhibit these kinases using integrative bioinformatics analysis. We integrated publicly available gene expression data, high-throughput pharmacological profiling data, and quantitative in vitro kinase binding data to determine the kinase dependency in 12 TNBC cell lines. We employed Kinase Addiction Ranker (KAR), a novel bioinformatics approach, which integrated these data sources to dissect kinase dependency in TNBC cell lines. We then used the kinase dependency predicted by KAR for each TNBC cell line to query K-Map for compounds targeting these kinases. We validated our predictions using published and new experimental data. In summary, we implemented an integrative bioinformatics analysis that determines kinase dependency in TNBC. Our analysis revealed candidate kinases as potential targets in TNBC for further pharmacological and biological studies.

  16. Causal inference in nonlinear systems: Granger causality versus time-delayed mutual information

    NASA Astrophysics Data System (ADS)

    Li, Songting; Xiao, Yanyang; Zhou, Douglas; Cai, David

    2018-05-01

    The Granger causality (GC) analysis has been extensively applied to infer causal interactions in dynamical systems arising from economy and finance, physics, bioinformatics, neuroscience, social science, and many other fields. In the presence of potential nonlinearity in these systems, the validity of the GC analysis in general is questionable. To illustrate this, here we first construct minimal nonlinear systems and show that the GC analysis fails to infer causal relations in these systems—it gives rise to all types of incorrect causal directions. In contrast, we show that the time-delayed mutual information (TDMI) analysis is able to successfully identify the direction of interactions underlying these nonlinear systems. We then apply both methods to neuroscience data collected from experiments and demonstrate that the TDMI analysis but not the GC analysis can identify the direction of interactions among neuronal signals. Our work exemplifies inference hazards in the GC analysis in nonlinear systems and suggests that the TDMI analysis can be an appropriate tool in such a case.

  17. A longitudinal social network analysis of the editorial boards of medical informatics and bioinformatics journals.

    PubMed

    Malin, Bradley; Carley, Kathleen

    2007-01-01

    The goal of this research is to learn how the editorial staffs of bioinformatics and medical informatics journals provide support for cross-community exposure. Models such as co-citation and co-author analysis measure the relationships between researchers; but they do not capture how environments that support knowledge transfer across communities are organized. In this paper, we propose a social network analysis model to study how editorial boards integrate researchers from disparate communities. We evaluate our model by building relational networks based on the editorial boards of approximately 40 journals that serve as research outlets in medical informatics and bioinformatics. We track the evolution of editorial relationships through a longitudinal investigation over the years 2000 through 2005. Our findings suggest that there are research journals that support the collocation of editorial board members from the bioinformatics and medical informatics communities. Network centrality metrics indicate that editorial board members are located in the intersection of the communities and that the number of individuals in the intersection is growing with time. Social network analysis methods provide insight into the relationships between the medical informatics and bioinformatics communities. The number of editorial board members facilitating the publication intersection of the communities has grown, but the intersection remains dependent on a small group of individuals and fragile.

  18. ExPASy: SIB bioinformatics resource portal.

    PubMed

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-07-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

  19. Bioclipse: an open source workbench for chemo- and bioinformatics.

    PubMed

    Spjuth, Ola; Helmus, Tobias; Willighagen, Egon L; Kuhn, Stefan; Eklund, Martin; Wagener, Johannes; Murray-Rust, Peter; Steinbeck, Christoph; Wikberg, Jarl E S

    2007-02-22

    There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

  20. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  1. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  2. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    PubMed Central

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of Gagne's Conditions of Learning instructional design theory. This theory, although first published in the early 1970s, is still fundamental in instructional design and instructional technology. First, top-level as well as prerequisite learning objectives for a microarray analysis workshop and a primer design workshop were defined. Then a hierarchy of objectives for each workshop was created. Hands-on tutorials were designed to meet these objectives. Finally, events of learning proposed by Gagne's theory were incorporated into the hands-on tutorials. The resultant manuals were tested on a small number of trainees, revised, and applied in 1-day bioinformatics workshops. Based on this experience and on observations made during the workshops, we conclude that Gagne's Conditions of Learning instructional design theory provides a useful framework for developing bioinformatics training, but may not be optimal as a method for teaching it. PMID:16220141

  3. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  4. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species

    USDA-ARS?s Scientific Manuscript database

    Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS fi...

  6. Assessing an effective undergraduate module teaching applied bioinformatics to biology students

    PubMed Central

    2018-01-01

    Applied bioinformatics skills are becoming ever more indispensable for biologists, yet incorporation of these skills into the undergraduate biology curriculum is lagging behind, in part due to a lack of instructors willing and able to teach basic bioinformatics in classes that don’t specifically focus on quantitative skill development, such as statistics or computer sciences. To help undergraduate course instructors who themselves did not learn bioinformatics as part of their own education and are hesitant to plunge into teaching big data analysis, a module was developed that is written in plain-enough language, using publicly available computing tools and data, to allow novice instructors to teach next-generation sequence analysis to upper-level undergraduate students. To determine if the module allowed students to develop a better understanding of and appreciation for applied bioinformatics, various tools were developed and employed to assess the impact of the module. This article describes both the module and its assessment. Students found the activity valuable for their education and, in focus group discussions, emphasized that they saw a need for more and earlier instruction of big data analysis as part of the undergraduate biology curriculum. PMID:29324777

  7. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  8. SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.

    PubMed

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-03-08

    The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.

  9. SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

    PubMed Central

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-01-01

    Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562

  10. High-throughput bioinformatics with the Cyrille2 pipeline system

    PubMed Central

    Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ

    2008-01-01

    Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742

  11. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.

    PubMed

    Heo, Go Eun; Kang, Keun Young; Song, Min; Lee, Jeong-Hoon

    2017-05-31

    Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.

  12. Bioinformatic identification and experimental validation of miRNAs from foxtail millet (Setaria italica).

    PubMed

    Han, Jun; Xie, Hao; Sun, Qingpeng; Wang, Jun; Lu, Min; Wang, Weixiang; Guo, Erhu; Pan, Jinbao

    2014-08-10

    MiRNAs are a novel group of non-coding small RNAs that negatively regulate gene expression. Many miRNAs have been identified and investigated extensively in plant species with sequenced genomes. However, few miRNAs have been identified in foxtail millet (Setaria italica), which is an ancient cereal crop of great importance for dry land agriculture. In this study, 271 foxtail millet miRNAs belonging to 44 families were identified using a bioinformatics approach. Twenty-three pairs of sense/antisense miRNAs belonging to 13 families, and 18 miRNA clusters containing members of 8 families were discovered in foxtail millet. We identified 432 potential targets for 38 miRNA families, most of which were predicted to be involved in plant development, signal transduction, metabolic pathways, disease resistance, and environmental stress responses. Gene ontology (GO) analysis revealed that 101, 56, and 23 target genes were involved in molecular functions, biological processes, and cellular components, respectively. We investigated the expression patterns of 43 selected miRNAs using qRT-PCR analysis. All of the miRNAs were expressed ubiquitously with many exhibiting different expression levels in different tissues. We validated five predicted targets of four miRNAs using the RNA ligase mediated rapid amplification of cDNA end (5'-RLM-RACE) method. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. METABOLOMICS IN MEDICAL SCIENCES--TRENDS, CHALLENGES AND PERSPECTIVES.

    PubMed

    Klupczyńska, Agnieszka; Dereziński, Paweł; Kokot, Zenon J

    2015-01-01

    Metabolomics is the latest of the "omic" technologies that involves comprehensive analysis of small molecule metabolites of an organism or a specific biological sample. Metabolomics provides an insight into the cell status and describes an actual health condition of organisms. Analysis of metabolome offers a unique opportunity to study the influence of genetic variation, disease, applied treatment or diet on endogenous metabolic state of organisms. There are many areas that might benefit from metabolomic research. In the article some applications of this novel "omic" technology in the field of medical sciences are presented. One of the most popular aims of metabolomic studies is biomarker discovery. Despite using the state-of-art analytical techniques along with advanced bioinformatic tools, metabolomic experiments encounter numerous difficulties and pitfalls. Challenges that researchers in the field of analysis of metabolome have to face include i.a., technical limitations, bioinformatic challenges and integration with other "omic" sciences. One of the grand challenges for studies in the field of metabolomics is to tackle the problem of data analysis, which is probably the most time consuming stage of metabolomic workflow and requires close collaboration between analysts, clinicians and experts in chemometric analysis. Implementation of metabolomics into clinical practice will be dependent on establishment of standardized protocols in analytical performance and data analysis and development of fit-for-purpose biomarker method validation. Metabolomics allows to achieve a sophisticated level of information about biological systems and opens up new perspectives in many fields of medicine, especially in oncology. Apart from its extensive cognitive significance, metabolomics manifests also a practical importance as it may lead to design of new non-invasive, sensitive and specific diagnostic techniques and development of new therapies.

  14. Intact glycopeptide characterization using mass spectrometry.

    PubMed

    Cao, Li; Qu, Yi; Zhang, Zhaorui; Wang, Zhe; Prytkova, Iya; Wu, Si

    2016-05-01

    Glycosylation is one of the most prominent and extensively studied protein post-translational modifications. However, traditional proteomic studies at the peptide level (bottom-up) rarely characterize intact glycopeptides (glycosylated peptides without removing glycans), so no glycoprotein heterogeneity information is retained. Intact glycopeptide characterization, on the other hand, provides opportunities to simultaneously elucidate the glycan structure and the glycosylation site needed to reveal the actual biological function of protein glycosylation. Recently, significant improvements have been made in the characterization of intact glycopeptides, ranging from enrichment and separation, mass spectroscopy (MS) detection, to bioinformatics analysis. In this review, we recapitulated currently available intact glycopeptide characterization methods with respect to their advantages and limitations as well as their potential applications.

  15. Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer

    PubMed Central

    Arighi, Cecilia N.; Wu, Cathy H.

    2015-01-01

    Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge “maps” of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease. PMID:26509276

  16. Analyses of Brucella Pathogenesis, Host Immunity, and Vaccine Targets using Systems Biology and Bioinformatics

    PubMed Central

    He, Yongqun

    2011-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning. PMID:22919594

  17. Analyses of Brucella pathogenesis, host immunity, and vaccine targets using systems biology and bioinformatics.

    PubMed

    He, Yongqun

    2012-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning.

  18. Diagnosis of autosomal dominant polycystic kidney disease using efficient PKD1 and PKD2 targeted next-generation sequencing.

    PubMed

    Trujillano, Daniel; Bullich, Gemma; Ossowski, Stephan; Ballarín, José; Torra, Roser; Estivill, Xavier; Ars, Elisabet

    2014-09-01

    Molecular diagnostics of autosomal dominant polycystic kidney disease (ADPKD) relies on mutation screening of PKD1 and PKD2, which is complicated by extensive allelic heterogeneity and the presence of six highly homologous sequences of PKD1. To date, specific sequencing of PKD1 requires laborious long-range amplifications. The high cost and long turnaround time of PKD1 and PKD2 mutation analysis using conventional techniques limits its widespread application in clinical settings. We performed targeted next-generation sequencing (NGS) of PKD1 and PKD2. Pooled barcoded DNA patient libraries were enriched by in-solution hybridization with PKD1 and PKD2 capture probes. Bioinformatics analysis was performed using an in-house developed pipeline. We validated the assay in a cohort of 36 patients with previously known PKD1 and PKD2 mutations and five control individuals. Then, we used the same assay and bioinformatics analysis in a discovery cohort of 12 uncharacterized patients. We detected 35 out of 36 known definitely, highly likely, and likely pathogenic mutations in the validation cohort, including two large deletions. In the discovery cohort, we detected 11 different pathogenic mutations in 10 out of 12 patients. This study demonstrates that laborious long-range PCRs of the repeated PKD1 region can be avoided by in-solution enrichment of PKD1 and PKD2 and NGS. This strategy significantly reduces the cost and time for simultaneous PKD1 and PKD2 sequence analysis, facilitating routine genetic diagnostics of ADPKD.

  19. PySCeSToolbox: a collection of metabolic pathway analysis tools.

    PubMed

    Christensen, Carl D; Hofmeyr, Jan-Hendrik S; Rohwer, Johann M

    2018-01-01

    PySCeSToolbox is an extension to the Python Simulator for Cellular Systems (PySCeS) that includes tools for performing generalized supply-demand analysis, symbolic metabolic control analysis, and a framework for investigating the kinetic and thermodynamic aspects of enzyme-catalyzed reactions. Each tool addresses a different aspect of metabolic behaviour, control, and regulation; the tools complement each other and can be used in conjunction to better understand higher level system behaviour. PySCeSToolbox is available on Linux, Mac OS X and Windows. It is licensed under the BSD 3-clause licence. Code, setup instructions and a link to documentation can be found at https://github.com/PySCeS/PyscesToolbox. jr@sun.ac.za. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

    PubMed

    Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert

    2008-05-01

    Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

  1. Bioinformatics-based tools in drug discovery: the cartography from single gene to integrative biological networks.

    PubMed

    Ramharack, Pritika; Soliman, Mahmoud E S

    2018-06-01

    Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.

    Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBasemore » project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.« less

  3. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

    PubMed

    Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

    2005-04-07

    Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.

  4. Omics Metadata Management Software v. 1 (OMMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less

  5. Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

    PubMed

    Brohée, Sylvain; Barriot, Roland; Moreau, Yves

    2010-09-01

    In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.

  6. BIRCH: a user-oriented, locally-customizable, bioinformatics system.

    PubMed

    Fristensky, Brian

    2007-02-09

    Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.

  7. BIRCH: A user-oriented, locally-customizable, bioinformatics system

    PubMed Central

    Fristensky, Brian

    2007-01-01

    Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere. PMID:17291351

  8. Bioinformatics meets user-centred design: a perspective.

    PubMed

    Pavelin, Katrina; Cham, Jennifer A; de Matos, Paula; Brooksbank, Cath; Cameron, Graham; Steinbeck, Christoph

    2012-01-01

    Designers have a saying that "the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years." It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI), and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD) strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.

  9. BIAS: Bioinformatics Integrated Application Software.

    PubMed

    Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z

    2005-04-15

    We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.

  10. Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.

    PubMed

    Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W

    2018-01-01

    Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. ENFIN--A European network for integrative systems biology.

    PubMed

    Kahlem, Pascal; Clegg, Andrew; Reisinger, Florian; Xenarios, Ioannis; Hermjakob, Henning; Orengo, Christine; Birney, Ewan

    2009-11-01

    Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.

  12. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  13. Bioinformatics core competencies for undergraduate life sciences education.

    PubMed

    Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.

  14. Bioinformatics core competencies for undergraduate life sciences education

    PubMed Central

    Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542

  15. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  16. India's Computational Biology Growth and Challenges.

    PubMed

    Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy

    2016-09-01

    India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.

  17. Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data

    DTIC Science & Technology

    2004-12-01

    genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS

  18. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  19. In Silico PCR Tools for a Fast Primer, Probe, and Advanced Searching.

    PubMed

    Kalendar, Ruslan; Muterko, Alexandr; Shamekova, Malika; Zhambakin, Kabyl

    2017-01-01

    The polymerase chain reaction (PCR) is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. The principle of this technique has been further used and applied in plenty of other simple or complex nucleic acid amplification technologies (NAAT). In parallel to laboratory "wet bench" experiments for nucleic acid amplification technologies, in silico or virtual (bioinformatics) approaches have been developed, among which in silico PCR analysis. In silico NAAT analysis is a useful and efficient complementary method to ensure the specificity of primers or probes for an extensive range of PCR applications from homology gene discovery, molecular diagnosis, DNA fingerprinting, and repeat searching. Predicting sensitivity and specificity of primers and probes requires a search to determine whether they match a database with an optimal number of mismatches, similarity, and stability. In the development of in silico bioinformatics tools for nucleic acid amplification technologies, the prospects for the development of new NAAT or similar approaches should be taken into account, including forward-looking and comprehensive analysis that is not limited to only one PCR technique variant. The software FastPCR and the online Java web tool are integrated tools for in silico PCR of linear and circular DNA, multiple primer or probe searches in large or small databases and for advanced search. These tools are suitable for processing of batch files that are essential for automation when working with large amounts of data. The FastPCR software is available for download at http://primerdigital.com/fastpcr.html and the online Java version at http://primerdigital.com/tools/pcr.html .

  20. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  1. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. Planning bioinformatics workflows using an expert system

    PubMed Central

    Chen, Xiaoling; Chang, Jeffrey T.

    2017-01-01

    Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928

  3. Prediction of Acute Mountain Sickness using a Blood-Based Test

    DTIC Science & Technology

    2016-01-01

    2015): In quarter 17 we focused on two major tasks: getting the RNA purified and ready for chip analysis and working on the bioinformatics ... bioinformatics organization of all the data we will examine for this study. To remind the reviewer, we have a primary dataset of ~120 subjects who were studied...companion study, AltitudeOmics, to the database of gene studies to be analyzed for AMS prediction • expansion of a bioinformatics team to include an

  4. VitaPad: visualization tools for the analysis of pathway data.

    PubMed

    Holford, Matthew; Li, Naixin; Nadkarni, Prakash; Zhao, Hongyu

    2005-04-15

    Packages that support the creation of pathway diagrams are limited by their inability to be readily extended to new classes of pathway-related data. VitaPad is a cross-platform application that enables users to create and modify biological pathway diagrams and incorporate microarray data with them. It improves on existing software in the following areas: (i) It can create diagrams dynamically through graph layout algorithms. (ii) It is open-source and uses an open XML format to store data, allowing for easy extension or integration with other tools. (iii) It features a cutting-edge user interface with intuitive controls, high-resolution graphics and fully customizable appearance. http://bioinformatics.med.yale.edu matthew.holford@yale.edu; hongyu.zhao@yale.edu.

  5. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Qaadri, Kashef [Biomatters Inc., San Francisco, CA (United States)

    2018-05-21

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  6. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qaadri, Kashef

    2012-06-01

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  7. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

    PubMed

    Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

    2010-01-01

    Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.

  8. Semi-quantitative proteomics of mammalian cells upon short-term exposure to non-ionizing electromagnetic fields.

    PubMed

    Kuzniar, Arnold; Laffeber, Charlie; Eppink, Berina; Bezstarosti, Karel; Dekkers, Dick; Woelders, Henri; Zwamborn, A Peter M; Demmers, Jeroen; Lebbink, Joyce H G; Kanaar, Roland

    2017-01-01

    The potential effects of non-ionizing electromagnetic fields (EMFs), such as those emitted by power-lines (in extremely low frequency range), mobile cellular systems and wireless networking devices (in radio frequency range) on human health have been intensively researched and debated. However, how exposure to these EMFs may lead to biological changes underlying possible health effects is still unclear. To reveal EMF-induced molecular changes, unbiased experiments (without a priori focusing on specific biological processes) with sensitive readouts are required. We present the first proteome-wide semi-quantitative mass spectrometry analysis of human fibroblasts, osteosarcomas and mouse embryonic stem cells exposed to three types of non-ionizing EMFs (ELF 50 Hz, UMTS 2.1 GHz and WiFi 5.8 GHz). We performed controlled in vitro EMF exposures of metabolically labeled mammalian cells followed by reliable statistical analyses of differential protein- and pathway-level regulations using an array of established bioinformatics methods. Our results indicate that less than 1% of the quantitated human or mouse proteome responds to the EMFs by small changes in protein abundance. Further network-based analysis of the differentially regulated proteins did not detect significantly perturbed cellular processes or pathways in human and mouse cells in response to ELF, UMTS or WiFi exposure. In conclusion, our extensive bioinformatics analyses of semi-quantitative mass spectrometry data do not support the notion that the short-time exposures to non-ionizing EMFs have a consistent biologically significant bearing on mammalian cells in culture.

  9. Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase's adaptation to alkaline conditions.

    PubMed

    Suplatov, Dmitry; Panin, Nikolay; Kirilin, Evgeny; Shcherbakova, Tatyana; Kudryavtsev, Pavel; Svedas, Vytas

    2014-01-01

    Protein stability provides advantageous development of novel properties and can be crucial in affording tolerance to mutations that introduce functionally preferential phenotypes. Consequently, understanding the determining factors for protein stability is important for the study of structure-function relationship and design of novel protein functions. Thermal stability has been extensively studied in connection with practical application of biocatalysts. However, little work has been done to explore the mechanism of pH-dependent inactivation. In this study, bioinformatic analysis of the Ntn-hydrolase superfamily was performed to identify functionally important subfamily-specific positions in protein structures. Furthermore, the involvement of these positions in pH-induced inactivation was studied. The conformational mobility of penicillin acylase in Escherichia coli was analyzed through molecular modeling in neutral and alkaline conditions. Two functionally important subfamily-specific residues, Gluβ482 and Aspβ484, were found. Ionization of these residues at alkaline pH promoted the collapse of a buried network of stabilizing interactions that consequently disrupted the functional protein conformation. The subfamily-specific position Aspβ484 was selected as a hotspot for mutation to engineer enzyme variant tolerant to alkaline medium. The corresponding Dβ484N mutant was produced and showed 9-fold increase in stability at alkaline conditions. Bioinformatic analysis of subfamily-specific positions can be further explored to study mechanisms of protein inactivation and to design more stable variants for the engineering of homologous Ntn-hydrolases with improved catalytic properties.

  10. Semi-quantitative proteomics of mammalian cells upon short-term exposure to non-ionizing electromagnetic fields

    PubMed Central

    Laffeber, Charlie; Eppink, Berina; Bezstarosti, Karel; Dekkers, Dick; Woelders, Henri; Zwamborn, A. Peter M.; Demmers, Jeroen; Lebbink, Joyce H. G.; Kanaar, Roland

    2017-01-01

    The potential effects of non-ionizing electromagnetic fields (EMFs), such as those emitted by power-lines (in extremely low frequency range), mobile cellular systems and wireless networking devices (in radio frequency range) on human health have been intensively researched and debated. However, how exposure to these EMFs may lead to biological changes underlying possible health effects is still unclear. To reveal EMF-induced molecular changes, unbiased experiments (without a priori focusing on specific biological processes) with sensitive readouts are required. We present the first proteome-wide semi-quantitative mass spectrometry analysis of human fibroblasts, osteosarcomas and mouse embryonic stem cells exposed to three types of non-ionizing EMFs (ELF 50 Hz, UMTS 2.1 GHz and WiFi 5.8 GHz). We performed controlled in vitro EMF exposures of metabolically labeled mammalian cells followed by reliable statistical analyses of differential protein- and pathway-level regulations using an array of established bioinformatics methods. Our results indicate that less than 1% of the quantitated human or mouse proteome responds to the EMFs by small changes in protein abundance. Further network-based analysis of the differentially regulated proteins did not detect significantly perturbed cellular processes or pathways in human and mouse cells in response to ELF, UMTS or WiFi exposure. In conclusion, our extensive bioinformatics analyses of semi-quantitative mass spectrometry data do not support the notion that the short-time exposures to non-ionizing EMFs have a consistent biologically significant bearing on mammalian cells in culture. PMID:28234898

  11. Structure and evolution of N-domains in AAA metalloproteases.

    PubMed

    Scharfenberg, Franka; Serek-Heuberger, Justyna; Coles, Murray; Hartmann, Marcus D; Habeck, Michael; Martin, Jörg; Lupas, Andrei N; Alva, Vikram

    2015-02-27

    Metalloproteases of the AAA (ATPases associated with various cellular activities) family play a crucial role in protein quality control within the cytoplasmic membrane of bacteria and the inner membrane of eukaryotic organelles. These membrane-anchored hexameric enzymes are composed of an N-terminal domain with one or two transmembrane helices, a central AAA ATPase module, and a C-terminal Zn(2+)-dependent protease. While the latter two domains have been well studied, so far, little is known about the N-terminal regions. Here, in an extensive bioinformatic and structural analysis, we identified three major, non-homologous groups of N-domains in AAA metalloproteases. By far, the largest one is the FtsH-like group of bacteria and eukaryotic organelles. The other two groups are specific to Yme1: one found in plants, fungi, and basal metazoans and the other one found exclusively in animals. Using NMR and crystallography, we determined the subunit structure and hexameric assembly of Escherichia coli FtsH-N, exhibiting an unusual α+β fold, and the conserved part of fungal Yme1-N from Saccharomyces cerevisiae, revealing a tetratricopeptide repeat fold. Our bioinformatic analysis showed that, uniquely among these proteins, the N-domain of Yme1 from the cnidarian Hydra vulgaris contains both the tetratricopeptide repeat region seen in basal metazoans and a region of homology to the N-domains of animals. Thus, it is a modern-day representative of an intermediate in the evolution of animal Yme1 from basal eukaryotic precursors. Copyright © 2015. Published by Elsevier Ltd.

  12. Computational Design of a pH Stable Enzyme: Understanding Molecular Mechanism of Penicillin Acylase's Adaptation to Alkaline Conditions

    PubMed Central

    Suplatov, Dmitry; Panin, Nikolay; Kirilin, Evgeny; Shcherbakova, Tatyana; Kudryavtsev, Pavel; Švedas, Vytas

    2014-01-01

    Protein stability provides advantageous development of novel properties and can be crucial in affording tolerance to mutations that introduce functionally preferential phenotypes. Consequently, understanding the determining factors for protein stability is important for the study of structure-function relationship and design of novel protein functions. Thermal stability has been extensively studied in connection with practical application of biocatalysts. However, little work has been done to explore the mechanism of pH-dependent inactivation. In this study, bioinformatic analysis of the Ntn-hydrolase superfamily was performed to identify functionally important subfamily-specific positions in protein structures. Furthermore, the involvement of these positions in pH-induced inactivation was studied. The conformational mobility of penicillin acylase in Escherichia coli was analyzed through molecular modeling in neutral and alkaline conditions. Two functionally important subfamily-specific residues, Gluβ482 and Aspβ484, were found. Ionization of these residues at alkaline pH promoted the collapse of a buried network of stabilizing interactions that consequently disrupted the functional protein conformation. The subfamily-specific position Aspβ484 was selected as a hotspot for mutation to engineer enzyme variant tolerant to alkaline medium. The corresponding Dβ484N mutant was produced and showed 9-fold increase in stability at alkaline conditions. Bioinformatic analysis of subfamily-specific positions can be further explored to study mechanisms of protein inactivation and to design more stable variants for the engineering of homologous Ntn-hydrolases with improved catalytic properties. PMID:24959852

  13. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  14. Phylogeny and evolution of plant cyclic nucleotide-gated ion channel (CNGC) gene family and functional analyses of tomato CNGCs

    PubMed Central

    Saand, Mumtaz Ali; Xu, You-Ping; Munyampundu, Jean-Pierre; Li, Wen; Zhang, Xuan-Rui; Cai, Xin-Zhong

    2015-01-01

    Cyclic nucleotide-gated ion channels (CNGCs) are calcium-permeable channels that are involved in various biological functions. Nevertheless, phylogeny and function of plant CNGCs are not well understood. In this study, 333 CNGC genes from 15 plant species were identified using comprehensive bioinformatics approaches. Extensive bioinformatics analyses demonstrated that CNGCs of Group IVa were distinct to those of other groups in gene structure and amino acid sequence of cyclic nucleotide-binding domain. A CNGC-specific motif that recognizes all identified plant CNGCs was generated. Phylogenetic analysis indicated that CNGC proteins of flowering plant species formed five groups. However, CNGCs of the non-vascular plant Physcomitrella patens clustered only in two groups (IVa and IVb), while those of the vascular non-flowering plant Selaginella moellendorffii gathered in four (IVa, IVb, I and II). These data suggest that Group IV CNGCs are most ancient and Group III CNGCs are most recently evolved in flowering plants. Furthermore, silencing analyses revealed that a set of CNGC genes might be involved in disease resistance and abiotic stress responses in tomato and function of SlCNGCs does not correlate with the group that they are belonging to. Our results indicate that Group IVa CNGCs are structurally but not functionally unique among plant CNGCs. PMID:26546226

  15. Studies in the extensively automatic construction of large odds-based inference networks from structured data. Examples from medical, bioinformatics, and health insurance claims data.

    PubMed

    Robson, B; Boray, S

    2018-04-01

    Theoretical and methodological principles are presented for the construction of very large inference nets for odds calculations, composed of hundreds or many thousands or more of elements, in this paper generated by structured data mining. It is argued that the usual small inference nets can sometimes represent rather simple, arbitrary estimates. Examples of applications in clinical and public health data analysis, medical claims data and detection of irregular entries, and bioinformatics data, are presented. Construction of large nets benefits from application of a theory of expected information for sparse data and the Dirac notation and algebra. The extent to which these are important here is briefly discussed. Purposes of the study include (a) exploration of the properties of large inference nets and a perturbation and tacit conditionality models, (b) using these to propose simpler models including one that a physician could use routinely, analogous to a "risk score", (c) examination of the merit of describing optimal performance in a single measure that combines accuracy, specificity, and sensitivity in place of a ROC curve, and (d) relationship to methods for detecting anomalous and potentially fraudulent data. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. ASaiM: a Galaxy-based framework to analyze microbiota data.

    PubMed

    Batut, Bérénice; Gravouil, Kévin; Defois, Clémence; Hiltemann, Saskia; Brugère, Jean-François; Peyretaillade, Eric; Peyret, Pierre

    2018-05-22

    New generations of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore and visualize microbiota information from raw metataxonomic, metagenomic or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets, but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible and shareable.

  17. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    PubMed Central

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  18. ballaxy: web services for structural bioinformatics.

    PubMed

    Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

    2015-01-01

    Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling.  ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  20. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  1. T7 lytic phage-displayed peptide libraries: construction and diversity characterization.

    PubMed

    Krumpe, Lauren R H; Mori, Toshiyuki

    2014-01-01

    In this chapter, we describe the construction of T7 bacteriophage (phage)-displayed peptide libraries and the diversity analyses of random amino acid sequences obtained from the libraries. We used commercially available reagents, Novagen's T7Select system, to construct the libraries. Using a combination of biotinylated extension primer and streptavidin-coupled magnetic beads, we were able to prepare library DNA without applying gel purification, resulting in extremely high ligation efficiencies. Further, we describe the use of bioinformatics tools to characterize library diversity. Amino acid frequency and positional amino acid diversity and hydropathy are estimated using the REceptor LIgand Contacts website http://relic.bio.anl.gov. Peptide net charge analysis and peptide hydropathy analysis are conducted using the Genetics Computer Group Wisconsin Package computational tools. A comprehensive collection of the estimated number of recombinants and titers of T7 phage-displayed peptide libraries constructed in our lab is included.

  2. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    PubMed

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  3. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    PubMed Central

    Cheng, Gong; Zhang, Guocai; Xu, Liang

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

  4. KAnalyze: a fast versatile pipelined K-mer toolkit

    PubMed Central

    Audano, Peter; Vannberg, Fredrik

    2014-01-01

    Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. Availability and implementation: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/ Contact: fredrik.vannberg@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24642064

  5. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  6. Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

    PubMed

    Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia

    2009-06-16

    A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.

  7. Using Kepler for Tool Integration in Microarray Analysis Workflows.

    PubMed

    Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

  8. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  9. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  10. Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sulakhe, D.; Rodriguez, A.; Wilde, M.

    2008-03-01

    Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less

  11. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  12. PRADA: pipeline for RNA sequencing data analysis.

    PubMed

    Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

    2014-08-01

    Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program.  http://sourceforge.net/projects/prada/  gadgetz@broadinstitute.org or rverhaak@mdanderson.org  Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Survey of MapReduce frame operation in bioinformatics.

    PubMed

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  14. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less

  15. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  16. A review of bioinformatic methods for forensic DNA analyses.

    PubMed

    Liu, Yao-Yuan; Harbison, SallyAnn

    2018-03-01

    Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. A graph-based approach for designing extensible pipelines

    PubMed Central

    2012-01-01

    Background In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps. Results We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms. Conclusions Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. PMID:22788675

  18. Bioinformatics and Microarray Data Analysis on the Cloud.

    PubMed

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.

  19. Long Non-coding RNAs and Their Biological Roles in Plants

    PubMed Central

    Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian

    2015-01-01

    With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. PMID:25936895

  20. FCDD: A Database for Fruit Crops Diseases.

    PubMed

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  1. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  2. Density Deconvolution With EPI Splines

    DTIC Science & Technology

    2015-09-01

    effects of various substances on test subjects [11], [12]. Whereas in geophysics, a shot may be fired into the ground, in pharmacokinetics, a signal is...be significant, including medicine, bioinformatics, chemistry, as- tronomy, and econometrics , as well as an extensive review of kernel based methods...demonstrate the effectiveness of our model in simulations motivated by test instances in [32]. We consider an additive measurement model scenario where

  3. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Biological Dynamics Markup Language (BDML): an open format for representing quantitative biological dynamics data

    PubMed Central

    Kyoda, Koji; Tohsato, Yukako; Ho, Kenneth H. L.; Onami, Shuichi

    2015-01-01

    Motivation: Recent progress in live-cell imaging and modeling techniques has resulted in generation of a large amount of quantitative data (from experimental measurements and computer simulations) on spatiotemporal dynamics of biological objects such as molecules, cells and organisms. Although many research groups have independently dedicated their efforts to developing software tools for visualizing and analyzing these data, these tools are often not compatible with each other because of different data formats. Results: We developed an open unified format, Biological Dynamics Markup Language (BDML; current version: 0.2), which provides a basic framework for representing quantitative biological dynamics data for objects ranging from molecules to cells to organisms. BDML is based on Extensible Markup Language (XML). Its advantages are machine and human readability and extensibility. BDML will improve the efficiency of development and evaluation of software tools for data visualization and analysis. Availability and implementation: A specification and a schema file for BDML are freely available online at http://ssbd.qbic.riken.jp/bdml/. Contact: sonami@riken.jp Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:25414366

  5. Biological Dynamics Markup Language (BDML): an open format for representing quantitative biological dynamics data.

    PubMed

    Kyoda, Koji; Tohsato, Yukako; Ho, Kenneth H L; Onami, Shuichi

    2015-04-01

    Recent progress in live-cell imaging and modeling techniques has resulted in generation of a large amount of quantitative data (from experimental measurements and computer simulations) on spatiotemporal dynamics of biological objects such as molecules, cells and organisms. Although many research groups have independently dedicated their efforts to developing software tools for visualizing and analyzing these data, these tools are often not compatible with each other because of different data formats. We developed an open unified format, Biological Dynamics Markup Language (BDML; current version: 0.2), which provides a basic framework for representing quantitative biological dynamics data for objects ranging from molecules to cells to organisms. BDML is based on Extensible Markup Language (XML). Its advantages are machine and human readability and extensibility. BDML will improve the efficiency of development and evaluation of software tools for data visualization and analysis. A specification and a schema file for BDML are freely available online at http://ssbd.qbic.riken.jp/bdml/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  6. Agents in bioinformatics, computational and systems biology.

    PubMed

    Merelli, Emanuela; Armano, Giuliano; Cannata, Nicola; Corradini, Flavio; d'Inverno, Mark; Doms, Andreas; Lord, Phillip; Martin, Andrew; Milanesi, Luciano; Möller, Steffen; Schroeder, Michael; Luck, Michael

    2007-01-01

    The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.

  7. The eBioKit, a stand-alone educational platform for bioinformatics.

    PubMed

    Hernández-de-Diego, Rafael; de Villiers, Etienne P; Klingström, Tomas; Gourlé, Hadrien; Conesa, Ana; Bongcam-Rudloff, Erik

    2017-09-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative.

  8. The eBioKit, a stand-alone educational platform for bioinformatics

    PubMed Central

    Conesa, Ana; Bongcam-Rudloff, Erik

    2017-01-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative. PMID:28910280

  9. Open discovery: An integrated live Linux platform of Bioinformatics tools.

    PubMed

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery - a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in.

  10. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  11. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

    PubMed

    El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

    2012-01-01

    Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

  12. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm.

    PubMed

    Skinnider, Michael A; Dejong, Chris A; Franczak, Brian C; McNicholas, Paul D; Magarvey, Nathan A

    2017-08-16

    Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

  13. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    ERIC Educational Resources Information Center

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…

  14. MultiSeq: unifying sequence and structure data for evolutionary analysis

    PubMed Central

    Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

    2006-01-01

    Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: PMID:16914055

  15. Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

    PubMed

    Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

    2010-01-11

    Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.

  16. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  17. A powerful and flexible approach to the analysis of RNA sequence count data.

    PubMed

    Zhou, Yi-Hui; Xia, Kai; Wright, Fred A

    2011-10-01

    A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean-variance relationships provides a flexible testing regimen that 'borrows' information across genes, while easily incorporating design effects and additional covariates. We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean-variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility. An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq yzhou@bios.unc.edu; fwright@bios.unc.edu Supplementary data are available at Bioinformatics online.

  18. The immune epitope database: a historical retrospective of the first decade.

    PubMed

    Salimi, Nima; Fleri, Ward; Peters, Bjoern; Sette, Alessandro

    2012-10-01

    As the amount of biomedical information available in the literature continues to increase, databases that aggregate this information continue to grow in importance and scope. The population of databases can occur either through fully automated text mining approaches or through manual curation by human subject experts. We here report our experiences in populating the National Institute of Allergy and Infectious Diseases sponsored Immune Epitope Database and Analysis Resource (IEDB, http://iedb.org), which was created in 2003, and as of 2012 captures the epitope information from approximately 99% of all papers published to date that describe immune epitopes (with the exception of cancer and HIV data). This was achieved using a hybrid model based on automated document categorization and extensive human expert involvement. This task required automated scanning of over 22 million PubMed abstracts followed by classification and curation of over 13 000 references, including over 7000 infectious disease-related manuscripts, over 1000 allergy-related manuscripts, roughly 4000 related to autoimmunity, and 1000 transplant/alloantigen-related manuscripts. The IEDB curation involves an unprecedented level of detail, capturing for each paper the actual experiments performed for each different epitope structure. Key to enabling this process was the extensive use of ontologies to ensure rigorous and consistent data representation as well as interoperability with other bioinformatics resources, including the Protein Data Bank, Chemical Entities of Biological Interest, and the NIAID Bioinformatics Resource Centers. A growing fraction of the IEDB data derives from direct submissions by research groups engaged in epitope discovery, and is being facilitated by the implementation of novel data submission tools. The present explosion of information contained in biological databases demands effective query and display capabilities to optimize the user experience. Accordingly, the development of original ways to query the database, on the basis of ontologically driven hierarchical trees, and display of epitope data in aggregate in a biologically intuitive yet rigorous fashion is now at the forefront of the IEDB efforts. We also highlight advances made in the realm of epitope analysis and predictive tools available in the IEDB. © 2012 The Authors. Immunology © 2012 Blackwell Publishing Ltd.

  19. KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.

    PubMed

    Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert

    2017-05-15

    Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.

  20. Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

    PubMed

    Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

  1. Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

    PubMed Central

    Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

  2. Ambiguity and variability of database and software names in bioinformatics.

    PubMed

    Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran

    2015-01-01

    There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.

  3. ReadXplorer—visualization and analysis of mapped sequences

    PubMed Central

    Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

    2014-01-01

    Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157

  4. BioContainers: an open-source and community-driven framework for software standardization.

    PubMed

    da Veiga Leprevost, Felipe; Grüning, Björn A; Alves Aflitos, Saulo; Röst, Hannes L; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I; Perez-Riverol, Yasset

    2017-08-15

    BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). The software is freely available at github.com/BioContainers/. yperez@ebi.ac.uk. © The Author(s) 2017. Published by Oxford University Press.

  5. BioContainers: an open-source and community-driven framework for software standardization

    PubMed Central

    da Veiga Leprevost, Felipe; Grüning, Björn A.; Alves Aflitos, Saulo; Röst, Hannes L.; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C.; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I.; Perez-Riverol, Yasset

    2017-01-01

    Abstract Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/. Contact yperez@ebi.ac.uk PMID:28379341

  6. Genomic big data hitting the storage bottleneck.

    PubMed

    Papageorgiou, Louis; Eleni, Picasi; Raftopoulou, Sofia; Mantaiou, Meropi; Megalooikonomou, Vasileios; Vlachakis, Dimitrios

    2018-01-01

    During the last decades, there is a vast data explosion in bioinformatics. Big data centres are trying to face this data crisis, reaching high storage capacity levels. Although several scientific giants examine how to handle the enormous pile of information in their cupboards, the problem remains unsolved. On a daily basis, there is a massive quantity of permanent loss of extensive information due to infrastructure and storage space problems. The motivation for sequencing has fallen behind. Sometimes, the time that is spent to solve storage space problems is longer than the one dedicated to collect and analyse data. To bring sequencing to the foreground, scientists have to slide over such obstacles and find alternative ways to approach the issue of data volume. Scientific community experiences the data crisis era, where, out of the box solutions may ease the typical research workflow, until technological development meets the needs of Bioinformatics.

  7. Long non-coding RNAs and their biological roles in plants.

    PubMed

    Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian

    2015-06-01

    With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  8. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    ERIC Educational Resources Information Center

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  9. Open discovery: An integrated live Linux platform of Bioinformatics tools

    PubMed Central

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery ‐ a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. Availability The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in PMID:19238235

  10. Scalable web services for the PSIPRED Protein Analysis Workbench.

    PubMed

    Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T

    2013-07-01

    Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.

  11. RImmPort: an R/Bioconductor package that enables ready-for-analysis immunology research data.

    PubMed

    Shankar, Ravi D; Bhattacharya, Sanchita; Jujjavarapu, Chethan; Andorf, Sandra; Wiser, Jeffery A; Butte, Atul J

    2017-04-01

    : Open access to raw clinical and molecular data related to immunological studies has created a tremendous opportunity for data-driven science. We have developed RImmPort that prepares NIAID-funded research study datasets in ImmPort (immport.org) for analysis in R. RImmPort comprises of three main components: (i) a specification of R classes that encapsulate study data, (ii) foundational methods to load data of a specific study and (iii) generic methods to slice and dice data across different dimensions in one or more studies. Furthermore, RImmPort supports open formalisms, such as CDISC standards on the open source bioinformatics platform Bioconductor, to ensure that ImmPort curated study datasets are seamlessly accessible and ready for analysis, thus enabling innovative bioinformatics research in immunology. RImmPort is available as part of Bioconductor (bioconductor.org/packages/RImmPort). rshankar@stanford.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. Mapping the Extracellular and Membrane Proteome Associated with the Vasculature and the Stroma in the Embryo*

    PubMed Central

    Soulet, Fabienne; Kilarski, Witold W.; Roux-Dalvai, Florence; Herbert, John M. J.; Sacewicz, Izabela; Mouton-Barbosa, Emmanuelle; Bicknell, Roy; Lalor, Patricia; Monsarrat, Bernard; Bikfalvi, Andreas

    2013-01-01

    In order to map the extracellular or membrane proteome associated with the vasculature and the stroma in an embryonic organism in vivo, we developed a biotinylation technique for chicken embryo and combined it with mass spectrometry and bioinformatic analysis. We also applied this procedure to implanted tumors growing on the chorioallantoic membrane or after the induction of granulation tissue. Membrane and extracellular matrix proteins were the most abundant components identified. Relative quantitative analysis revealed differential protein expression patterns in several tissues. Through a bioinformatic approach, we determined endothelial cell protein expression signatures, which allowed us to identify several proteins not yet reported to be associated with endothelial cells or the vasculature. This is the first study reported so far that applies in vivo biotinylation, in combination with robust label-free quantitative proteomics approaches and bioinformatic analysis, to an embryonic organism. It also provides the first description of the vascular and matrix proteome of the embryo that might constitute the starting point for further developments. PMID:23674615

  13. Bioinformatics Identification of Modules of Transcription Factor Binding Sites in Alzheimer's Disease-Related Genes by In Silico Promoter Analysis and Microarrays

    PubMed Central

    Augustin, Regina; Lichtenthaler, Stefan F.; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich

    2011-01-01

    The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189

  14. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes.

    PubMed

    Malik, Rainer; Chauhan, Ganesh; Traylor, Matthew; Sargurupremraj, Muralidharan; Okada, Yukinori; Mishra, Aniket; Rutten-Jacobs, Loes; Giese, Anne-Katrin; van der Laan, Sander W; Gretarsdottir, Solveig; Anderson, Christopher D; Chong, Michael; Adams, Hieab H H; Ago, Tetsuro; Almgren, Peter; Amouyel, Philippe; Ay, Hakan; Bartz, Traci M; Benavente, Oscar R; Bevan, Steve; Boncoraglio, Giorgio B; Brown, Robert D; Butterworth, Adam S; Carrera, Caty; Carty, Cara L; Chasman, Daniel I; Chen, Wei-Min; Cole, John W; Correa, Adolfo; Cotlarciuc, Ioana; Cruchaga, Carlos; Danesh, John; de Bakker, Paul I W; DeStefano, Anita L; den Hoed, Marcel; Duan, Qing; Engelter, Stefan T; Falcone, Guido J; Gottesman, Rebecca F; Grewal, Raji P; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeffrey; Harris, Tamara B; Hassan, Ahamad; Havulinna, Aki S; Heckbert, Susan R; Holliday, Elizabeth G; Howard, George; Hsu, Fang-Chi; Hyacinth, Hyacinth I; Ikram, M Arfan; Ingelsson, Erik; Irvin, Marguerite R; Jian, Xueqiu; Jiménez-Conde, Jordi; Johnson, Julie A; Jukema, J Wouter; Kanai, Masahiro; Keene, Keith L; Kissela, Brett M; Kleindorfer, Dawn O; Kooperberg, Charles; Kubo, Michiaki; Lange, Leslie A; Langefeld, Carl D; Langenberg, Claudia; Launer, Lenore J; Lee, Jin-Moo; Lemmens, Robin; Leys, Didier; Lewis, Cathryn M; Lin, Wei-Yu; Lindgren, Arne G; Lorentzen, Erik; Magnusson, Patrik K; Maguire, Jane; Manichaikul, Ani; McArdle, Patrick F; Meschia, James F; Mitchell, Braxton D; Mosley, Thomas H; Nalls, Michael A; Ninomiya, Toshiharu; O'Donnell, Martin J; Psaty, Bruce M; Pulit, Sara L; Rannikmäe, Kristiina; Reiner, Alexander P; Rexrode, Kathryn M; Rice, Kenneth; Rich, Stephen S; Ridker, Paul M; Rost, Natalia S; Rothwell, Peter M; Rotter, Jerome I; Rundek, Tatjana; Sacco, Ralph L; Sakaue, Saori; Sale, Michele M; Salomaa, Veikko; Sapkota, Bishwa R; Schmidt, Reinhold; Schmidt, Carsten O; Schminke, Ulf; Sharma, Pankaj; Slowik, Agnieszka; Sudlow, Cathie L M; Tanislav, Christian; Tatlisumak, Turgut; Taylor, Kent D; Thijs, Vincent N S; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Tiedt, Steffen; Trompet, Stella; Tzourio, Christophe; van Duijn, Cornelia M; Walters, Matthew; Wareham, Nicholas J; Wassertheil-Smoller, Sylvia; Wilson, James G; Wiggins, Kerri L; Yang, Qiong; Yusuf, Salim; Bis, Joshua C; Pastinen, Tomi; Ruusalepp, Arno; Schadt, Eric E; Koplev, Simon; Björkegren, Johan L M; Codoni, Veronica; Civelek, Mete; Smith, Nicholas L; Trégouët, David A; Christophersen, Ingrid E; Roselli, Carolina; Lubitz, Steven A; Ellinor, Patrick T; Tai, E Shyong; Kooner, Jaspal S; Kato, Norihiro; He, Jiang; van der Harst, Pim; Elliott, Paul; Chambers, John C; Takeuchi, Fumihiko; Johnson, Andrew D; Sanghera, Dharambir K; Melander, Olle; Jern, Christina; Strbian, Daniel; Fernandez-Cadenas, Israel; Longstreth, W T; Rolfs, Arndt; Hata, Jun; Woo, Daniel; Rosand, Jonathan; Pare, Guillaume; Hopewell, Jemma C; Saleheen, Danish; Stefansson, Kari; Worrall, Bradford B; Kittner, Steven J; Seshadri, Sudha; Fornage, Myriam; Markus, Hugh S; Howson, Joanna M M; Kamatani, Yoichiro; Debette, Stephanie; Dichgans, Martin; Malik, Rainer; Chauhan, Ganesh; Traylor, Matthew; Sargurupremraj, Muralidharan; Okada, Yukinori; Mishra, Aniket; Rutten-Jacobs, Loes; Giese, Anne-Katrin; van der Laan, Sander W; Gretarsdottir, Solveig; Anderson, Christopher D; Chong, Michael; Adams, Hieab H H; Ago, Tetsuro; Almgren, Peter; Amouyel, Philippe; Ay, Hakan; Bartz, Traci M; Benavente, Oscar R; Bevan, Steve; Boncoraglio, Giorgio B; Brown, Robert D; Butterworth, Adam S; Carrera, Caty; Carty, Cara L; Chasman, Daniel I; Chen, Wei-Min; Cole, John W; Correa, Adolfo; Cotlarciuc, Ioana; Cruchaga, Carlos; Danesh, John; de Bakker, Paul I W; DeStefano, Anita L; Hoed, Marcel den; Duan, Qing; Engelter, Stefan T; Falcone, Guido J; Gottesman, Rebecca F; Grewal, Raji P; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeffrey; Harris, Tamara B; Hassan, Ahamad; Havulinna, Aki S; Heckbert, Susan R; Holliday, Elizabeth G; Howard, George; Hsu, Fang-Chi; Hyacinth, Hyacinth I; Ikram, M Arfan; Ingelsson, Erik; Irvin, Marguerite R; Jian, Xueqiu; Jiménez-Conde, Jordi; Johnson, Julie A; Jukema, J Wouter; Kanai, Masahiro; Keene, Keith L; Kissela, Brett M; Kleindorfer, Dawn O; Kooperberg, Charles; Kubo, Michiaki; Lange, Leslie A; Langefeld, Carl D; Langenberg, Claudia; Launer, Lenore J; Lee, Jin-Moo; Lemmens, Robin; Leys, Didier; Lewis, Cathryn M; Lin, Wei-Yu; Lindgren, Arne G; Lorentzen, Erik; Magnusson, Patrik K; Maguire, Jane; Manichaikul, Ani; McArdle, Patrick F; Meschia, James F; Mitchell, Braxton D; Mosley, Thomas H; Nalls, Michael A; Ninomiya, Toshiharu; O'Donnell, Martin J; Psaty, Bruce M; Pulit, Sara L; Rannikmäe, Kristiina; Reiner, Alexander P; Rexrode, Kathryn M; Rice, Kenneth; Rich, Stephen S; Ridker, Paul M; Rost, Natalia S; Rothwell, Peter M; Rotter, Jerome I; Rundek, Tatjana; Sacco, Ralph L; Sakaue, Saori; Sale, Michele M; Salomaa, Veikko; Sapkota, Bishwa R; Schmidt, Reinhold; Schmidt, Carsten O; Schminke, Ulf; Sharma, Pankaj; Slowik, Agnieszka; Sudlow, Cathie L M; Tanislav, Christian; Tatlisumak, Turgut; Taylor, Kent D; Thijs, Vincent N S; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Tiedt, Steffen; Trompet, Stella; Tzourio, Christophe; van Duijn, Cornelia M; Walters, Matthew; Wareham, Nicholas J; Wassertheil-Smoller, Sylvia; Wilson, James G; Wiggins, Kerri L; Yang, Qiong; Yusuf, Salim; Amin, Najaf; Aparicio, Hugo S; Arnett, Donna K; Attia, John; Beiser, Alexa S; Berr, Claudine; Buring, Julie E; Bustamante, Mariana; Caso, Valeria; Cheng, Yu-Ching; Choi, Seung Hoan; Chowhan, Ayesha; Cullell, Natalia; Dartigues, Jean-François; Delavaran, Hossein; Delgado, Pilar; Dörr, Marcus; Engström, Gunnar; Ford, Ian; Gurpreet, Wander S; Hamsten, Anders; Heitsch, Laura; Hozawa, Atsushi; Ibanez, Laura; Ilinca, Andreea; Ingelsson, Martin; Iwasaki, Motoki; Jackson, Rebecca D; Jood, Katarina; Jousilahti, Pekka; Kaffashian, Sara; Kalra, Lalit; Kamouchi, Masahiro; Kitazono, Takanari; Kjartansson, Olafur; Kloss, Manja; Koudstaal, Peter J; Krupinski, Jerzy; Labovitz, Daniel L; Laurie, Cathy C; Levi, Christopher R; Li, Linxin; Lind, Lars; Lindgren, Cecilia M; Lioutas, Vasileios; Liu, Yong Mei; Lopez, Oscar L; Makoto, Hirata; Martinez-Majander, Nicolas; Matsuda, Koichi; Minegishi, Naoko; Montaner, Joan; Morris, Andrew P; Muiño, Elena; Müller-Nurasyid, Martina; Norrving, Bo; Ogishima, Soichi; Parati, Eugenio A; Peddareddygari, Leema Reddy; Pedersen, Nancy L; Pera, Joanna; Perola, Markus; Pezzini, Alessandro; Pileggi, Silvana; Rabionet, Raquel; Riba-Llena, Iolanda; Ribasés, Marta; Romero, Jose R; Roquer, Jaume; Rudd, Anthony G; Sarin, Antti-Pekka; Sarju, Ralhan; Sarnowski, Chloe; Sasaki, Makoto; Satizabal, Claudia L; Satoh, Mamoru; Sattar, Naveed; Sawada, Norie; Sibolt, Gerli; Sigurdsson, Ásgeir; Smith, Albert; Sobue, Kenji; Soriano-Tárraga, Carolina; Stanne, Tara; Stine, O Colin; Stott, David J; Strauch, Konstantin; Takai, Takako; Tanaka, Hideo; Tanno, Kozo; Teumer, Alexander; Tomppo, Liisa; Torres-Aguila, Nuria P; Touze, Emmanuel; Tsugane, Shoichiro; Uitterlinden, Andre G; Valdimarsson, Einar M; van der Lee, Sven J; Völzke, Henry; Wakai, Kenji; Weir, David; Williams, Stephen R; Wolfe, Charles D A; Wong, Quenna; Xu, Huichun; Yamaji, Taiki; Sanghera, Dharambir K; Melander, Olle; Jern, Christina; Strbian, Daniel; Fernandez-Cadenas, Israel; Longstreth, W T; Rolfs, Arndt; Hata, Jun; Woo, Daniel; Rosand, Jonathan; Pare, Guillaume; Hopewell, Jemma C; Saleheen, Danish; Stefansson, Kari; Worrall, Bradford B; Kittner, Steven J; Seshadri, Sudha; Fornage, Myriam; Markus, Hugh S; Howson, Joanna M M; Kamatani, Yoichiro; Debette, Stephanie; Dichgans, Martin

    2018-04-01

    Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.

  15. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes

    PubMed Central

    Malik, Rainer; Chauhan, Ganesh; Traylor, Matthew; Sargurupremraj, Muralidharan; Okada, Yukinori; Mishra, Aniket; Rutten-Jacobs, Loes; Giese, Anne-Katrin; van der Laan, Sander W.; Gretarsdottir, Solveig; Anderson, Christopher D.; Chong, Michael; Adams, Hieab H. H.; Ago, Tetsuro; Almgren, Peter; Amouyel, Philippe; Ay, Hakan; Bartz, Traci M.; Benavente, Oscar R.; Bevan, Steve; Boncoraglio, Giorgio B.; Brown, Robert D.; Butterworth, Adam S.; Carrera, Caty; Carty, Cara L.; Chasman, Daniel I.; Chen, Wei-Min; Cole, John W.; Correa, Adolfo; Cotlarciuc, Ioana; Cruchaga, Carlos; Danesh, John; de Bakker, Paul I. W.; DeStefano, Anita L.; den Hoed, Marcel; Duan, Qing; Engelter, Stefan T.; Falcone, Guido J.; Gottesman, Rebecca F.; Grewal, Raji P.; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeffrey; Harris, Tamara B.; Hassan, Ahamad; Havulinna, Aki S.; Heckbert, Susan R.; Holliday, Elizabeth G.; Howard, George; Hsu, Fang-Chi; Hyacinth, Hyacinth I.; Ikram, M. Arfan; ingelsson, Erik; Irvin, Marguerite R.; Jian, Xueqiu; Jimenez-Conde, Jordi; Johnson, Julie A.; Jukema, J. Wouter; Kanai, Masahiro; Keene, Keith L.; Kissela, Brett M.; Kleindorfer, Dawn O.; Kooperberg, Charles; Kubo, Michiaki; Lange, Leslie A.; Langefeld, Carl D.; Langenberg, Claudia; Launer, Lenore J.; Lee, Jin-Moo; Lemmens, Robin; Leys, Didier; Lewis, Cathryn M.; Lin, Wei-Yu; Lindgren, Arne G.; Lorentzen, Erik; Magnusson, Patrik K.; Maguire, Jane; Manichaikul, Ani; McArdle, Patrick F.; Meschia, James F.; Mitchell, Braxton D.; Mosley, Thomas H.; Nalls, Michael A.; Ninomiya, Toshiharu; O’Donnell, Martin J.; Psaty, Bruce M.; Pulit, Sara L.; Rannikmäe, Kristiina; Reiner, Alexander P.; Rexrode, Kathryn M.; Rice, Kenneth; Rich, Stephen S.; Ridker, Paul M.; Rost, Natalia S.; Rothwell, Peter M.; Rotter, Jerome I.; Rundek, Tatjana; Sacco, Ralph L.; Sakaue, Saori; Sale, Michele M.; Salomaa, Veikko; Sapkota, Bishwa R.; Schmidt, Reinhold; Schmidt, Carsten O.; Schminke, Ulf; Sharma, Pankaj; Slowik, Agnieszka; Sudlow, Cathie L. M.; Tanislav, Christian; Tatlisumak, Turgut; Taylor, Kent D.; Thijs, Vincent N. S.; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Tiedt, Steffen; Trompet, Stella; Tzourio, Christophe; van Duijn, Cornelia M.; Walters, Matthew; Wareham, Nicholas J.; Wassertheil-Smoller, Sylvia; Wilson, James G.; Wiggins, Kerri L.; Yang, Qiong; Yusuf, Salim; Bis, Joshua C.; Pastinen, Tomi; Ruusalepp, Arno; Schadt, Eric E.; Koplev, Simon; Björkegren, Johan L. M.; Codoni, Veronica; Civelek, Mete; Smith, Nicholas L.; Tregouet, David A.; Christophersen, Ingrid E.; Roselli, Carolina; Lubitz, Steven A.; Ellinor, Patrick T.; Tai, E. Shyong; Kooner, Jaspal S.; Kato, Norihiro; He, Jiang; van der Harst, Pim; Elliott, Paul; Chambers, John C.; Takeuchi, Fumihiko; Johnson, Andrew D.; Sanghera, Dharambir K.; Melander, Olle; Jern, Christina; Strbian, Daniel; Fernandez-Cadenas, Israel; Longstreth, W. T.; Rolfs, Arndt; Hata, Jun; Woo, Daniel; Rosand, Jonathan; Pare, Guillaume; Hopewell, Jemma C.; Saleheen, Danish; Stefansson, Kari; Worrall, Bradford B.; Kittner, Steven J.; Seshadri, Sudha; Fornage, Myriam; Markus, Hugh S.; Howson, Joanna M. M.; Kamatani, Yoichiro; Debette, Stephanie; Dichgans, Martin

    2018-01-01

    Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy. PMID:29531354

  16. Label-free quantitative proteomic analysis of human plasma-derived microvesicles to find protein signatures of abdominal aortic aneurysms.

    PubMed

    Martinez-Pinna, Roxana; Gonzalez de Peredo, Anne; Monsarrat, Bernard; Burlet-Schiltz, Odile; Martin-Ventura, Jose Luis

    2014-08-01

    To find potential biomarkers of abdominal aortic aneurysms (AAA), we performed a differential proteomic study based on human plasma-derived microvesicles. Exosomes and microparticles isolated from plasma of AAA patients and control subjects (n = 10 each group) were analyzed by a label-free quantitative MS-based strategy. Homemade and publicly available software packages have been used for MS data analysis. The application of two kinds of bioinformatic tools allowed us to find differential protein profiles from AAA patients. Some of these proteins found by the two analysis methods belong to main pathological mechanisms of AAA such as oxidative stress, immune-inflammation, and thrombosis. Data analysis from label-free MS-based experiments requires the use of sophisticated bioinformatic approaches to perform quantitative studies from complex protein mixtures. The application of two of these bioinformatic tools provided us a preliminary list of differential proteins found in plasma-derived microvesicles not previously associated to AAA, which could help us to understand the pathological mechanisms related to this disease. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. A bioinformatics expert system linking functional data to anatomical outcomes in limb regeneration

    PubMed Central

    Lobo, Daniel; Feldman, Erica B.; Shah, Michelle; Malone, Taylor J.

    2014-01-01

    Abstract Amphibians and molting arthropods have the remarkable capacity to regenerate amputated limbs, as described by an extensive literature of experimental cuts, amputations, grafts, and molecular techniques. Despite a rich history of experimental effort, no comprehensive mechanistic model exists that can account for the pattern regulation observed in these experiments. While bioinformatics algorithms have revolutionized the study of signaling pathways, no such tools have heretofore been available to assist scientists in formulating testable models of large‐scale morphogenesis that match published data in the limb regeneration field. Major barriers to preventing an algorithmic approach are the lack of formal descriptions for experimental regenerative information and a repository to centralize storage and mining of functional data on limb regeneration. Establishing a new bioinformatics of shape would significantly accelerate the discovery of key insights into the mechanisms that implement complex regeneration. Here, we describe a novel mathematical ontology for limb regeneration to unambiguously encode phenotype, manipulation, and experiment data. Based on this formalism, we present the first centralized formal database of published limb regeneration experiments together with a user‐friendly expert system tool to facilitate its access and mining. These resources are freely available for the community and will assist both human biologists and artificial intelligence systems to discover testable, mechanistic models of limb regeneration. PMID:25729585

  18. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    NASA Astrophysics Data System (ADS)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  19. eMZed: an open source framework in Python for rapid and interactive development of LC/MS data analysis workflows.

    PubMed

    Kiefer, Patrick; Schmitt, Uwe; Vorholt, Julia A

    2013-04-01

    The Python-based, open-source eMZed framework was developed for mass spectrometry (MS) users to create tailored workflows for liquid chromatography (LC)/MS data analysis. The goal was to establish a unique framework with comprehensive basic functionalities that are easy to apply and allow for the extension and modification of the framework in a straightforward manner. eMZed supports the iterative development and prototyping of individual evaluation strategies by providing a computing environment and tools for inspecting and modifying underlying LC/MS data. The framework specifically addresses non-expert programmers, as it requires only basic knowledge of Python and relies largely on existing successful open-source software, e.g. OpenMS. The framework eMZed and its documentation are freely available at http://emzed.biol.ethz.ch/. eMZed is published under the GPL 3.0 license, and an online discussion group is available at https://groups.google.com/group/emzed-users. Supplementary data are available at Bioinformatics online.

  20. RAPIDR: an analysis package for non-invasive prenatal testing of aneuploidy

    PubMed Central

    Lo, Kitty K.; Boustred, Christopher; Chitty, Lyn S.; Plagnol, Vincent

    2014-01-01

    Non-invasive prenatal testing (NIPT) of fetal aneuploidy using cell-free fetal DNA is becoming part of routine clinical practice. RAPIDR (Reliable Accurate Prenatal non-Invasive Diagnosis R package) is an easy-to-use open-source R package that implements several published NIPT analysis methods. The input to RAPIDR is a set of sequence alignment files in the BAM format, and the outputs are calls for aneuploidy, including trisomies 13, 18, 21 and monosomy X as well as fetal sex. RAPIDR has been extensively tested with a large sample set as part of the RAPID project in the UK. The package contains quality control steps to make it robust for use in the clinical setting. Availability and implementation: RAPIDR is implemented in R and can be freely downloaded via CRAN from here: http://cran.r-project.org/web/packages/RAPIDR/index.html. Contact: kitty.lo@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24990604

  1. Genome-wide analysis of TCP family in tobacco.

    PubMed

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-05-23

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco.

  2. Quantitative Analysis of Global Proteome and Lysine Acetylome Reveal the Differential Impacts of VPA and SAHA on HL60 Cells.

    PubMed

    Zhu, Xiaoyu; Liu, Xin; Cheng, Zhongyi; Zhu, Jun; Xu, Lei; Wang, Fengsong; Qi, Wulin; Yan, Jiawei; Liu, Ning; Sun, Zimin; Liu, Huilan; Peng, Xiaojun; Hao, Yingchan; Zheng, Nan; Wu, Quan

    2016-01-29

    Valproic acid (VPA) and suberoylanilide hydroxamic acid (SAHA) are both HDAC inhibitors (HDACi). Previous studies indicated that both inhibitors show therapeutic effects on acute myeloid leukaemia (AML), while the differential impacts of the two different HDACi on AML treatment still remains elusive. In this study, using 3-plex SILAC based quantitative proteomics technique, anti-acetyllysine antibody based affinity enrichment, high resolution LC-MS/MS and intensive bioinformatic analysis, the quantitative proteome and acetylome in SAHA and VPA treated AML HL60 cells were extensively studied. In total, 5,775 proteins and 1,124 lysine acetylation sites were successfully obtained in response to VAP and SAHA treatment. It is found that VPA and SAHA treatment differently induced proteome and acetylome profiling in AML HL60 cells. This study revealed the differential impacts of VPA and SAHA on proteome/acetylome in AML cells, deepening our understanding of HDAC inhibitor mediated AML therapeutics.

  3. Sequence analysis and molecular characterization of Wnt4 gene in metacestodes of Taenia solium.

    PubMed

    Hou, Junling; Luo, Xuenong; Wang, Shuai; Yin, Cai; Zhang, Shaohua; Zhu, Xueliang; Dou, Yongxi; Cai, Xuepeng

    2014-04-01

    Wnt proteins are a family of secreted glycoproteins that are evolutionarily conserved and considered to be involved in extensive developmental processes in metazoan organisms. The characterization of wnt genes may improve understanding the parasite's development. In the present study, a wnt4 gene encoding 491amino acids was amplified from cDNA of metacestodes of Taenia solium using reverse transcription PCR (RT-PCR). Bioinformatics tools were used for sequence analysis. The conserved domain of the wnt gene family was predicted. The expression profile of Wnt4 was investigated using real-time PCR. Wnt4 expression was found to be dramatically increased in scolex evaginated cysticerci when compared to invaginated cysticerci. In situ hybridization showed that wnt4 gene was distributed in the posterior end of the worm along the primary body axis in evaginated cysticerci. These findings indicated that wnt4 may take part in the process of cysticerci evagination and play a role in scolex/bladder development of cysticerci of T. solium.

  4. DEsubs: an R package for flexible identification of differentially expressed subpathways using RNA-seq experiments.

    PubMed

    Vrahatis, Aristidis G; Balomenos, Panos; Tsakalidis, Athanasios K; Bezerianos, Anastasios

    2016-12-15

    DEsubs is a network-based systems biology R package that extracts disease-perturbed subpathways within a pathway network as recorded by RNA-seq experiments. It contains an extensive and customized framework with a broad range of operation modes at all stages of the subpathway analysis, enabling so a case-specific approach. The operation modes include pathway network construction and processing, subpathway extraction, visualization and enrichment analysis with regard to various biological and pharmacological features. Its capabilities render DEsubs a tool-guide for both the modeler and experimentalist for the identification of more robust systems-level drug targets and biomarkers for complex diseases. DEsubs is implemented as an R package following Bioconductor guidelines: http://bioconductor.org/packages/DEsubs/ CONTACT: tassos.bezerianos@nus.edu.sgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. [Efflux systems in Serratia marcescens].

    PubMed

    Mardanova, A M; Bogomol'naia, L M; Romanova, Iu D; Sharipova, M R

    2014-01-01

    A widespread bacterium Serratia marcescens (family Enterobacteriaceae) is an opportunistic and exhibits multiple drug resistance. Active removal of antibiotics and other antimicrobials from pathogen and exhibits multiple drug resistance. Active removal of antibiotics and other antimicrobials from the cells by efflux systems is one of the mechanisms responsible for microbial resistance to these compounds. Among enterobacteria, efflux systems of Escherichia coli and Salmonella enterica var. Typhimurium have been studied most extensively. Few efflux systems that belong to different families have been reported for S. marcescens. In this review, we analyzed available literature about S. marcescens efflux systems and carried out the comparative analysis of the genes encoding the RND type systems in different Serratia species and in other enterobacteria. Bioinformatical analysis of the S. marcescens genome allowed us to identify the previously unknown efflux systems based on their homology with the relevant E. coli genes. Identification of additional efflux systems in S. marcescens genome will promote our understanding of physiology of these bacteria, will detect new molecular mechanisms of resistance and will reveal their resistance potential.

  6. SeWeR: a customizable and integrated dynamic HTML interface to bioinformatics services.

    PubMed

    Basu, M K

    2001-06-01

    Sequence analysis using Web Resources (SeWeR) is an integrated, Dynamic HTML (DHTML) interface to commonly used bioinformatics services available on the World Wide Web. It is highly customizable, extendable, platform neutral, completely server-independent and can be hosted as a web page as well as being used as stand-alone software running within a web browser.

  7. An "in silico" Bioinformatics Laboratory Manual for Bioscience Departments: "Prediction of Glycosylation Sites in Phosphoethanolamine Transferases"

    ERIC Educational Resources Information Center

    Alyuruk, Hakan; Cavas, Levent

    2014-01-01

    Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…

  8. Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

    PubMed

    Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

    2007-01-01

    Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

  9. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  10. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  11. Shotgun proteomic analysis of Emiliania huxleyi, a marine phytoplankton species of major biogeochemical importance.

    PubMed

    Jones, Bethan M; Edwards, Richard J; Skipp, Paul J; O'Connor, C David; Iglesias-Rodriguez, M Debora

    2011-06-01

    Emiliania huxleyi is a unicellular marine phytoplankton species known to play a significant role in global biogeochemistry. Through the dual roles of photosynthesis and production of calcium carbonate (calcification), carbon is transferred from the atmosphere to ocean sediments. Almost nothing is known about the molecular mechanisms that control calcification, a process that is tightly regulated within the cell. To initiate proteomic studies on this important and phylogenetically remote organism, we have devised efficient protein extraction protocols and developed a bioinformatics pipeline that allows the statistically robust assignment of proteins from MS/MS data using preexisting EST sequences. The bioinformatics tool, termed BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics using ESTs), is fully automated and was used to search against data generated from three strains. BUDAPEST increased the number of identifications over standard protein database searches from 37 to 99 proteins when data were amalgamated. Proteins involved in diverse cellular processes were uncovered. For example, experimental evidence was obtained for a novel type I polyketide synthase and for various photosystem components. The proteomic and bioinformatic approaches developed in this study are of wider applicability, particularly to the oceanographic community where genomic sequence data for species of interest are currently scarce.

  12. Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference

    PubMed Central

    2011-01-01

    The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design. PMID:22372736

  13. Identification of novel candidate maternal serum protein markers for Down syndrome by integrated proteomic and bioinformatic analysis.

    PubMed

    Kang, Yuan; Dong, Xinran; Zhou, Qiongjie; Zhang, Ying; Cheng, Yan; Hu, Rong; Su, Cuihong; Jin, Hong; Liu, Xiaohui; Ma, Duan; Tian, Weidong; Li, Xiaotian

    2012-03-01

    This study aimed to identify candidate protein biomarkers from maternal serum for Down syndrome (DS) by integrated proteomic and bioinformatics analysis. A pregnancy DS group of 18 women and a control group with the same number were prepared, and the maternal serum proteins were analyzed by isobaric tags for relative and absolute quantitation and mass spectrometry, to identify DS differentially expressed maternal serum proteins (DS-DEMSPs). Comprehensive bioinformatics analysis was then employed to analyze DS-DEMSPs both in this paper and seven related publications. Down syndrome differentially expressed maternal serum proteins from different studies are significantly enriched with common Gene Ontology functions, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, transcription factor binding sites, and Pfam protein domains, However, the DS-DEMSPs are less functionally related to known DS-related genes. These evidences suggest that common molecular mechanisms induced by secondary effects may be present upon DS carrying. A simple scoring scheme revealed Alpha-2-macroglobulin, Apolipoprotein A1, Apolipoprotein E, Complement C1s subcomponent, Complement component 5, Complement component 8, alpha polypeptide, Complement component 8, beta polypeptide and Fibronectin as potential DS biomarkers. The integration of proteomics and bioinformatics studies provides a novel approach to develop new prenatal screening methods for noninvasive yet accurate diagnosis of DS. Copyright © 2012 John Wiley & Sons, Ltd.

  14. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System.

    PubMed

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-11-01

    Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV.

  15. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  16. Freshwater Metaviromics and Bacteriophages: A Current Assessment of the State of the Art in Relation to Bioinformatic Challenges

    PubMed Central

    Bruder, Katherine; Malki, Kema; Cooper, Alexandria; Sible, Emily; Shapiro, Jason W.; Watkins, Siobhan C.; Putonti, Catherine

    2016-01-01

    Advances in bioinformatics and sequencing technologies have allowed for the analysis of complex microbial communities at an unprecedented rate. While much focus is often placed on the cellular members of these communities, viruses play a pivotal role, particularly bacteria-infecting viruses (bacteriophages); phages mediate global biogeochemical processes and drive microbial evolution through bacterial grazing and horizontal gene transfer. Despite their importance and ubiquity in nature, very little is known about the diversity and structure of viral communities. Though the need for culture-based methods for viral identification has been somewhat circumvented through metagenomic techniques, the analysis of metaviromic data is marred with many unique issues. In this review, we examine the current bioinformatic approaches for metavirome analyses and the inherent challenges facing the field as illustrated by the ongoing efforts in the exploration of freshwater phage populations. PMID:27375355

  17. Biosynthetic multitasking facilitates thalassospiramide structural diversity in marine bacteria.

    PubMed

    Ross, Avena C; Xu, Ying; Lu, Liang; Kersten, Roland D; Shao, Zongze; Al-Suwailem, Abdulaziz M; Dorrestein, Pieter C; Qian, Pei-Yuan; Moore, Bradley S

    2013-01-23

    Thalassospiramides A and B are immunosuppressant cyclic lipopeptides first reported from the marine α-proteobacterium Thalassospira sp. CNJ-328. We describe here the discovery and characterization of an extended family of 14 new analogues from four Tistrella and Thalassospira isolates. These potent calpain 1 protease inhibitors belong to six structure classes in which the length and composition of the acylpeptide side chain varies extensively. Genomic sequence analysis of the thalassospiramide-producing microbes revealed related, genus-specific biosynthetic loci encoding hybrid nonribosomal peptide synthetase/polyketide synthases consistent with thalassospiramide assembly. The bioinformatics analysis of the gene clusters suggests that structural diversity, which ranges from the 803.4 Da thalassospiramide C to the 1291.7 Da thalassospiramide F, results from a complex sequence of reactions involving amino acid substrate channeling and enzymatic multimodule skipping and iteration. Preliminary biochemical analysis of the N-terminal nonribosomal peptide synthetase module from the Thalassospira TtcA megasynthase supports a biosynthetic model in which in cis amino acid activation competes with in trans activation to increase the range of amino acid substrates incorporated at the N terminus.

  18. Biosynthetic Multitasking Facilitates Thalassospiramide Structural Diversity in Marine Bacteria

    PubMed Central

    Ross, Avena C.; Xu, Ying; Lu, Liang; Kersten, Roland D.; Shao, Zongze; Al-Suwailem, Abdulaziz M.; Dorrestein, Pieter C.; Qian, Pei-Yuan; Moore, Bradley S.

    2013-01-01

    Thalassospiramides A and B are immunosuppressant cyclic lipopeptides first reported from the marine α-proteobacterium Thalassospira sp. CNJ-328. We describe here the discovery and characterization of an extended family of 14 new analogues from four Tistrella and Thalassospira isolates. These potent calpain 1 protease inhibitors belong to six structure classes in which the length and composition of the acylpeptide side chain varies extensively. Genomic sequence analysis of the thalassospiramide-producing microbes revealed related, genus-specific biosynthetic loci encoding hybrid nonribosomal peptide synthetase/polyketide synthases consistent with thalassospiramide assembly. The bioinformatics analysis of the gene clusters suggests that structural diversity, which ranges from the 803.4 Da thalassospiramide C to the 1291.7 Da thalassospiramide F, results from a complex sequence of reactions involving amino acid substrate channeling and enzymatic multi-module skipping and iteration. Preliminary biochemical analysis of the N-terminal NRPS module from the Thalassospira TtcA megasynthase supports a biosynthetic model in which in cis amino acid activation competes with in trans activation to increase the range of amino acid substrates incorporated at the N-terminus. PMID:23270364

  19. Integrated analysis of germline and somatic variants in ovarian cancer.

    PubMed

    Kanchi, Krishna L; Johnson, Kimberly J; Lu, Charles; McLellan, Michael D; Leiserson, Mark D M; Wendl, Michael C; Zhang, Qunyuan; Koboldt, Daniel C; Xie, Mingchao; Kandoth, Cyriac; McMichael, Joshua F; Wyczalkowski, Matthew A; Larson, David E; Schmidt, Heather K; Miller, Christopher A; Fulton, Robert S; Spellman, Paul T; Mardis, Elaine R; Druley, Todd E; Graubert, Timothy A; Goodfellow, Paul J; Raphael, Benjamin J; Wilson, Richard K; Ding, Li

    2014-01-01

    We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyse germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2 and PALB2. In addition, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B and MLL3). Evidence for loss of heterozygosity was found in 100 and 76% of cases with germline BRCA1 and BRCA2 truncations, respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 222 candidate functional germline truncation and missense variants, including two pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK and MLL pathways.

  20. “Exosomics”—A Review of Biophysics, Biology and Biochemistry of Exosomes With a Focus on Human Breast Milk

    PubMed Central

    de la Torre Gomez, Carolina; Goreham, Renee V.; Bech Serra, Joan J.; Nann, Thomas; Kussmann, Martin

    2018-01-01

    Exosomes are biomolecular nanostructures released from cells. They carry specific biomolecular information and are mainly researched for their exquisite properties as a biomarker source and delivery system. We introduce exosomes in the context of other extracellular vesicles, describe their biophysical isolation and characterisation and discuss their biochemical profiling. Motivated by our interest in early-life nutrition and health, and corresponding studies enrolling lactating mothers and their infants, we zoom into exosomes derived from human breast milk. We argue that these should be more extensively studied at proteomic and micronutrient profiling level, because breast milk exosomes provide a more specific window into breast milk quality from an immunological (proteomics) and nutritional (micronutrient) perspective. Such enhanced breast milk exosome profiling would thereby complement and enrich the more classical whole breast milk analysis and is expected to deliver more functional insights than the rather descriptive analysis of human milk, or larger fractions thereof, such as milk fat globule membrane. We substantiate our arguments by a bioinformatic analysis of two published proteomic data sets of human breast milk exosomes. PMID:29636770

  1. "Exosomics"-A Review of Biophysics, Biology and Biochemistry of Exosomes With a Focus on Human Breast Milk.

    PubMed

    de la Torre Gomez, Carolina; Goreham, Renee V; Bech Serra, Joan J; Nann, Thomas; Kussmann, Martin

    2018-01-01

    Exosomes are biomolecular nanostructures released from cells. They carry specific biomolecular information and are mainly researched for their exquisite properties as a biomarker source and delivery system. We introduce exosomes in the context of other extracellular vesicles, describe their biophysical isolation and characterisation and discuss their biochemical profiling. Motivated by our interest in early-life nutrition and health, and corresponding studies enrolling lactating mothers and their infants, we zoom into exosomes derived from human breast milk. We argue that these should be more extensively studied at proteomic and micronutrient profiling level, because breast milk exosomes provide a more specific window into breast milk quality from an immunological (proteomics) and nutritional (micronutrient) perspective. Such enhanced breast milk exosome profiling would thereby complement and enrich the more classical whole breast milk analysis and is expected to deliver more functional insights than the rather descriptive analysis of human milk, or larger fractions thereof, such as milk fat globule membrane. We substantiate our arguments by a bioinformatic analysis of two published proteomic data sets of human breast milk exosomes.

  2. Data Standards for Flow Cytometry

    PubMed Central

    SPIDLEN, JOSEF; GENTLEMAN, ROBERT C.; HAALAND, PERRY D.; LANGILLE, MORGAN; MEUR, NOLWENN LE; OCHS, MICHAEL F.; SCHMITT, CHARLES; SMITH, CLAYTON A.; TREISTER, ADAM S.; BRINKMAN, RYAN R.

    2009-01-01

    Flow cytometry (FCM) is an analytical tool widely used for cancer and HIV/AIDS research, and treatment, stem cell manipulation and detecting microorganisms in environmental samples. Current data standards do not capture the full scope of FCM experiments and there is a demand for software tools that can assist in the exploration and analysis of large FCM datasets. We are implementing a standardized approach to capturing, analyzing, and disseminating FCM data that will facilitate both more complex analyses and analysis of datasets that could not previously be efficiently studied. Initial work has focused on developing a community-based guideline for recording and reporting the details of FCM experiments. Open source software tools that implement this standard are being created, with an emphasis on facilitating reproducible and extensible data analyses. As well, tools for electronic collaboration will assist the integrated access and comprehension of experiments to empower users to collaborate on FCM analyses. This coordinated, joint development of bioinformatics standards and software tools for FCM data analysis has the potential to greatly facilitate both basic and clinical research—impacting a notably diverse range of medical and environmental research areas. PMID:16901228

  3. IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis

    PubMed Central

    Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A.

    2015-01-01

    The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. Contact: ppevzner@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072509

  4. Open source tools and toolkits for bioinformatics: significance, and where are we?

    PubMed

    Stajich, Jason E; Lapp, Hilmar

    2006-09-01

    This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.

  5. Genomics for Everyone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  6. Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

    PubMed Central

    2010-01-01

    Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262

  7. Bioinformatic Analysis of Pathogenic Missense Mutations of Activin Receptor Like Kinase 1 Ectodomain

    PubMed Central

    Scotti, Claudia; Olivieri, Carla; Boeri, Laura; Canzonieri, Cecilia; Ornati, Federica; Buscarini, Elisabetta; Pagella, Fabio; Danesino, Cesare

    2011-01-01

    Activin A receptor, type II-like kinase 1 (also called ALK1), is a serine-threonine kinase predominantly expressed on endothelial cells surface. Mutations in its ACVRL1 encoding gene (12q11-14) cause type 2 Hereditary Haemorrhagic Telangiectasia (HHT2), an autosomal dominant multisystem vascular dysplasia. The study of the structural effects of mutations is crucial to understand their pathogenic mechanism. However, while an X-ray structure of ALK1 intracellular domain has recently become available (PDB ID: 3MY0), structure determination of ALK1 ectodomain (ALK1EC) has been elusive so far. We here describe the building of a homology model for ALK1EC, followed by an extensive bioinformatic analysis, based on a set of 38 methods, of the effect of missense mutations at the sequence and structural level. ALK1EC potential interaction mode with its ligand BMP9 was then predicted combining modelling and docking data. The calculated model of the ALK1EC allowed mapping and a preliminary characterization of HHT2 associated mutations. Major structural changes and loss of stability of the protein were predicted for several mutations, while others were found to interfere mainly with binding to BMP9 or other interactors, like Endoglin (CD105), whose encoding ENG gene (9q34) mutations are known to cause type 1 HHT. This study gives a preliminary insight into the potential structure of ALK1EC and into the structural effects of HHT2 associated mutations, which can be useful to predict the potential effect of each single mutation, to devise new biological experiments and to interpret the biological significance of new mutations, private mutations, or non-synonymous polymorphisms. PMID:22028876

  8. Thorough analysis of unorthodox ABO deletions called by the 1000 Genomes project.

    PubMed

    Möller, M; Hellberg, Å; Olsson, M L

    2018-02-01

    ABO remains the clinically most important blood group system, but despite earlier extensive research, significant findings are still being made. The vast majority of catalogued ABO null alleles are based on the c.261delG polymorphism. Apart from c.802G>A, other mechanisms for O alleles are rare. While analysing the data set from the 1000 Genomes (1000G) project, we encountered two previously uncharacterized deletions, which needed further exploration. The Erythrogene database, complemented with bioinformatics software, was used to analyse ABO in 2504 individuals from 1000G. DNA samples from selected 1000G donors and African blood donors were examined by allele-specific PCR and Sanger sequencing to characterize predicted deletions. A 5821-bp deletion encompassing exons 5-7 was called in twenty 1000G individuals, predominantly Africans. This allele was confirmed and its exact deletion point defined by bioinformatic analyses and in vitro experiments. A PCR assay was developed, and screening of African samples revealed three donors heterozygous for this deletion, which was thereby phenotypically established as an O allele. Analysis of upstream genetic markers indicated an ancestral origin from ABO*O.01.02. We estimate this deletion as the 3rd most common mechanism behind O alleles. A 24-bp deletion was called in nine individuals and showed greater diversity regarding ethnic distribution and allelic background. It could neither be confirmed by in silico nor in vitro experiments. A previously uncharacterized ABO deletion among Africans was comprehensively mapped and a genotyping strategy devised. The false prediction of another deletion emphasizes the need for cautious interpretation of NGS data and calls for strict validation routines. © 2017 International Society of Blood Transfusion.

  9. Rare genetic variants in Tunisian Jewish patients suffering from age-related macular degeneration.

    PubMed

    Pras, Eran; Kristal, Dana; Shoshany, Nadav; Volodarsky, Dina; Vulih, Inna; Celniker, Gershon; Isakov, Ofer; Shomron, Noam; Pras, Elon

    2015-07-01

    To explore the molecular basis of familial, early onset, age-related macular degeneration (AMD) with diverse phenotypes, using whole exome sequencing (WES). We performed WES on four patients (two sibs from two families) manifesting early-onset AMD and searched for disease-causing genetic variants in previously identified macular degeneration related genes. Validation studies of the variants included bioinformatics tools, segregation analysis of mutations within the families and mutation screening in an AMD cohort of patients. The index patients were in their 50s when diagnosed and displayed a wide variety of clinical AMD presentations: from limited drusen in the posterior pole to multiple basal-laminar drusen extending peripherally. Severe visual impairment due to extensive geographic atrophy and/or choroidal-neovascularisation was common by the age of 75 years. Approximately, 400 000 genomic variants for each DNA sample were included in the downstream bioinformatics analysis, which ended in the discovery of two novel variants; in one family a single bp deletion was identified in the Hemicentin (HMCN1) gene (c.4162delC), whereas in the other, a missense variant (p.V412M) in the Complement Factor-I (CFI) gene was found. Screening for these variants in a cohort of patients with AMD identified another family with the CFI variant. This report uses WES to uncover rare genetic variants in AMD. A null-variant in HMCN1 has been identified in one AMD family, and a missense variant in CFI was discovered in two other families. These variants confirm the genetic complexity and significance of rare genetic variants in the pathogenesis of AMD. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  10. An architecture for genomics analysis in a clinical setting using Galaxy and Docker

    PubMed Central

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A

    2017-01-01

    Abstract Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. PMID:29048555

  11. An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

    PubMed

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A; Rance, B

    2017-11-01

    Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. © The Author 2017. Published by Oxford University Press.

  12. An overview of topic modeling and its current applications in bioinformatics.

    PubMed

    Liu, Lin; Tang, Lin; Dong, Wen; Yao, Shaowen; Zhou, Wei

    2016-01-01

    With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics. This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications. Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.

  13. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  14. BIOINFORMATICS IN THE K-8 CLASSROOM: DESIGNING INNOVATIVE ACTIVITIES FOR TEACHER IMPLEMENTATION

    PubMed Central

    Shuster, Michele; Claussen, Kira; Locke, Melly; Glazewski, Krista

    2016-01-01

    At the intersection of biology and computer science is the growing field of bioinformatics—the analysis of complex datasets of biological relevance. Despite the increasing importance of bioinformatics and associated practical applications, these are not standard topics in elementary and middle school classrooms. We report on a pilot project and its evolution to support implementation of bioinformatics-based activities in elementary and middle school classrooms. Specifically, we ultimately designed a multi-day summer teacher professional development workshop, in which teachers design innovative classroom activities. By focusing on teachers, our design leverages enhanced teacher knowledge and confidence to integrate innovative instructional materials into K-8 classrooms and contributes to capacity building in STEM instruction. PMID:27429860

  15. Bioinformatics Analysis Reveals Genes Involved in the Pathogenesis of Ameloblastoma and Keratocystic Odontogenic Tumor.

    PubMed

    Santos, Eliane Macedo Sobrinho; Santos, Hércules Otacílio; Dos Santos Dias, Ivoneth; Santos, Sérgio Henrique; Batista de Paula, Alfredo Maurício; Feltenberger, John David; Sena Guimarães, André Luiz; Farias, Lucyana Conceição

    2016-01-01

    Pathogenesis of odontogenic tumors is not well known. It is important to identify genetic deregulations and molecular alterations. This study aimed to investigate, through bioinformatic analysis, the possible genes involved in the pathogenesis of ameloblastoma (AM) and keratocystic odontogenic tumor (KCOT). Genes involved in the pathogenesis of AM and KCOT were identified in GeneCards. Gene list was expanded, and the gene interactions network was mapped using the STRING software. "Weighted number of links" (WNL) was calculated to identify "leader genes" (highest WNL). Genes were ranked by K-means method and Kruskal-Wallis test was used (P<0.001). Total interactions score (TIS) was also calculated using all interaction data generated by the STRING database, in order to achieve global connectivity for each gene. The topological and ontological analyses were performed using Cytoscape software and BinGO plugin. Literature review data was used to corroborate the bioinformatics data. CDK1 was identified as leader gene for AM. In KCOT group, results show PCNA and TP53 . Both tumors exhibit a power law behavior. Our topological analysis suggested leader genes possibly important in the pathogenesis of AM and KCOT, by clustering coefficient calculated for both odontogenic tumors (0.028 for AM, zero for KCOT). The results obtained in the scatter diagram suggest an important relationship of these genes with the molecular processes involved in AM and KCOT. Ontological analysis for both AM and KCOT demonstrated different mechanisms. Bioinformatics analyzes were confirmed through literature review. These results may suggest the involvement of promising genes for a better understanding of the pathogenesis of AM and KCOT.

  16. A powerful and flexible approach to the analysis of RNA sequence count data

    PubMed Central

    Zhou, Yi-Hui; Xia, Kai; Wright, Fred A.

    2011-01-01

    Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean–variance relationships provides a flexible testing regimen that ‘borrows’ information across genes, while easily incorporating design effects and additional covariates. Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean–variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility. Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq Contact: yzhou@bios.unc.edu; fwright@bios.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21810900

  17. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    PubMed

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  18. Classification of mislabelled microarrays using robust sparse logistic regression.

    PubMed

    Bootkrajang, Jakramate; Kabán, Ata

    2013-04-01

    Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.

  19. Global Analysis of Protein Lysine Succinylation Profiles and Their Overlap with Lysine Acetylation in the Marine Bacterium Vibrio parahemolyticus.

    PubMed

    Pan, Jianyi; Chen, Ran; Li, Chuchu; Li, Weiyan; Ye, Zhicang

    2015-10-02

    Protein lysine acylation, including acetylation and succinylation, has been found to be a major post-translational modification (PTM) and is associated with the regulation of cellular processes that are widespread in bacteria. Vibrio parahemolyticus is a model marine bacterium that causes seafood-borne illness in humans worldwide. The lysine acetylation of V. parahemolyticus has been extensively characterized in our previous work, and here, we report the first global analysis of lysine succinylation and the overlap between the two types of acylation in this bacterium. Using high-accuracy nano liquid chromatography-tandem mass spectrometry combined with affinity purification, we identified 1931 lysine succinylated peptides matched on 642 proteins, with the quantity of the succinyl-proteins accounting for 13.3% of the total proteins in cells. Bioinformatics analysis results showed that these succinylated proteins are involved in almost every cellular process, particularly in protein biosynthesis and metabolism, and are distributed in diverse subcellular compartments. Moreover, several sequence motifs were identified, including succinyl-lysine flanked by a lysine or arginine residue at the -8, -7, or +7 position and without these residues at the -1 or +2 position, and these motifs differ from those found in other bacteria and eukaryotic cells. Furthermore, a total of 517 succinyl-lysine sites (26.7%) on 288 proteins (44.9%) were also found to be acetylated, suggesting extensive overlap between succinylation and acetylation in this bacterium. This systematic analysis provides a promising starting point for further investigations of the physiologic and pathogenic roles of lysine succinylation and acetylation in V. parahemolyticus.

  20. PRINCESS: Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS

    PubMed Central

    Chen, Feng; Wang, Shuang; Jiang, Xiaoqian; Ding, Sijie; Lu, Yao; Kim, Jihoon; Sahinalp, S. Cenk; Shimizu, Chisato; Burns, Jane C.; Wright, Victoria J.; Png, Eileen; Hibberd, Martin L.; Lloyd, David D.; Yang, Hai; Telenti, Amalio; Bloss, Cinnamon S.; Fox, Dov; Lauter, Kristin; Ohno-Machado, Lucila

    2017-01-01

    Abstract Motivation: We introduce PRINCESS, a privacy-preserving international collaboration framework for analyzing rare disease genetic data that are distributed across different continents. PRINCESS leverages Software Guard Extensions (SGX) and hardware for trustworthy computation. Unlike a traditional international collaboration model, where individual-level patient DNA are physically centralized at a single site, PRINCESS performs a secure and distributed computation over encrypted data, fulfilling institutional policies and regulations for protected health information. Results: To demonstrate PRINCESS’ performance and feasibility, we conducted a family-based allelic association study for Kawasaki Disease, with data hosted in three different continents. The experimental results show that PRINCESS provides secure and accurate analyses much faster than alternative solutions, such as homomorphic encryption and garbled circuits (over 40 000× faster). Availability and Implementation: https://github.com/achenfengb/PRINCESS_opensource Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065902

  1. Metabolomics for laboratory diagnostics.

    PubMed

    Bujak, Renata; Struck-Lewicka, Wiktoria; Markuszewski, Michał J; Kaliszan, Roman

    2015-09-10

    Metabolomics is an emerging approach in a systems biology field. Due to continuous development in advanced analytical techniques and in bioinformatics, metabolomics has been extensively applied as a novel, holistic diagnostic tool in clinical and biomedical studies. Metabolome's measurement, as a chemical reflection of a current phenotype of a particular biological system, is nowadays frequently implemented to understand pathophysiological processes involved in disease progression as well as to search for new diagnostic or prognostic biomarkers of various organism's disorders. In this review, we discussed the research strategies and analytical platforms commonly applied in the metabolomics studies. The applications of the metabolomics in laboratory diagnostics in the last 5 years were also reviewed according to the type of biological sample used in the metabolome's analysis. We also discussed some limitations and further improvements which should be considered taking in mind potential applications of metabolomic research and practice. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

    NASA Astrophysics Data System (ADS)

    Aygün, Eser; Oommen, B. John; Cataltepe, Zehra

    Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classification using the information residing in their syntactic representations. The latter has traditionally been achieved using the edit distances required in the respective peptide comparisons. We advocate that one can model the differences between compared strings as a mutation model consisting of random Substitutions, Insertions and Deletions (SID) obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a Support Vector Machine (SVM)-based peptide classifier, referred to as OIT_SVM, can be devised.

  3. Dynamic Coupling and Allosteric Networks in the α Subunit of Heterotrimeric G Proteins.

    PubMed

    Yao, Xin-Qiu; Malik, Rabia U; Griggs, Nicholas W; Skjærven, Lars; Traynor, John R; Sivaramakrishnan, Sivaraj; Grant, Barry J

    2016-02-26

    G protein α subunits cycle between active and inactive conformations to regulate a multitude of intracellular signaling cascades. Important structural transitions occurring during this cycle have been characterized from extensive crystallographic studies. However, the link between observed conformations and the allosteric regulation of binding events at distal sites critical for signaling through G proteins remain unclear. Here we describe molecular dynamics simulations, bioinformatics analysis, and experimental mutagenesis that identifies residues involved in mediating the allosteric coupling of receptor, nucleotide, and helical domain interfaces of Gαi. Most notably, we predict and characterize novel allosteric decoupling mutants, which display enhanced helical domain opening, increased rates of nucleotide exchange, and constitutive activity in the absence of receptor activation. Collectively, our results provide a framework for explaining how binding events and mutations can alter internal dynamic couplings critical for G protein function. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  4. Genomics for Everyone

    ScienceCinema

    Chain, Patrick

    2018-05-31

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  5. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  6. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

    PubMed

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos

    2017-08-01

    Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.

  7. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

    PubMed Central

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis

    2017-01-01

    Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616

  8. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System

    PubMed Central

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-01-01

    Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379

  9. A mechanistic insight into the amyloidogenic structure of hIAPP peptide revealed from sequence analysis and molecular dynamics simulation.

    PubMed

    Chakraborty, Sandipan; Chatterjee, Barnali; Basu, Soumalee

    2012-07-01

    A collective approach of sequence analysis, phylogenetic tree and in silico prediction of amyloidogenecity using bioinformatics tools have been used to correlate the observed species-specific variations in IAPP sequences with the amyloid forming propensity. Observed substitution patterns indicate that probable changes in local hydrophobicity are instrumental in altering the aggregation propensity of the peptide. In particular, residues at 17th, 22nd and 23rd positions of the IAPP peptide are found to be crucial for amyloid formation. Proline25 primarily dictates the observed non-amyloidogenecity in rodents. Furthermore, extensive molecular dynamics simulation of 0.24 μs have been carried out with human IAPP (hIAPP) fragment 19-27, the portion showing maximum sequence variation across different species, to understand the native folding characteristic of this region. Principal component analysis in combination with free energy landscape analysis illustrates a four residue turn spanning from residue 22 to 25. The results provide a structural insight into the intramolecular β-sheet structure of amylin which probably is the template for nucleation of fibril formation and growth, a pathogenic feature of type II diabetes. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.

    PubMed

    Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E

    2012-03-19

    A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.

  11. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community

    PubMed Central

    2012-01-01

    Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538

  12. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  13. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  14. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  15. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

    PubMed Central

    Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-01-01

    Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919

  16. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.

    PubMed

    Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-05-01

    RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  17. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants

    PubMed Central

    Byrska-Bishop, Marta; Wallace, John; Frase, Alexander T; Ritchie, Marylyn D

    2018-01-01

    Abstract Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Contact mdritchie@geisinger.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968757

  18. Combining medical informatics and bioinformatics toward tools for personalized medicine.

    PubMed

    Sarachan, B D; Simmons, M K; Subramanian, P; Temkin, J M

    2003-01-01

    Key bioinformatics and medical informatics research areas need to be identified to advance knowledge and understanding of disease risk factors and molecular disease pathology in the 21 st century toward new diagnoses, prognoses, and treatments. Three high-impact informatics areas are identified: predictive medicine (to identify significant correlations within clinical data using statistical and artificial intelligence methods), along with pathway informatics and cellular simulations (that combine biological knowledge with advanced informatics to elucidate molecular disease pathology). Initial predictive models have been developed for a pilot study in Huntington's disease. An initial bioinformatics platform has been developed for the reconstruction and analysis of pathways, and work has begun on pathway simulation. A bioinformatics research program has been established at GE Global Research Center as an important technology toward next generation medical diagnostics. We anticipate that 21 st century medical research will be a combination of informatics tools with traditional biology wet lab research, and that this will translate to increased use of informatics techniques in the clinic.

  19. Application of bioinformatics in chronobiology research.

    PubMed

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through "omics" projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  20. GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training

    PubMed Central

    Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.

    2015-01-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  1. Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

    PubMed

    Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory

    2016-06-13

    Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

  2. Empowered genome community: leveraging a bioinformatics platform as a citizen-scientist collaboration tool.

    PubMed

    Wendelsdorf, Katherine; Shah, Sohela

    2015-09-01

    There is on-going effort in the biomedical research community to leverage Next Generation Sequencing (NGS) technology to identify genetic variants that affect our health. The main challenge facing researchers is getting enough samples from individuals either sick or healthy - to be able to reliably identify the few variants that are causal for a phenotype among all other variants typically seen among individuals. At the same time, more and more individuals are having their genome sequenced either out of curiosity or to identify the cause of an illness. These individuals may benefit from of a way to view and understand their data. QIAGEN's Ingenuity Variant Analysis is an online application that allows users with and without extensive bioinformatics training to incorporate information from published experiments, genetic databases, and a variety of statistical models to identify variants, from a long list of candidates, that are most likely causal for a phenotype as well as annotate variants with what is already known about them in the literature and databases. Ingenuity Variant Analysis is also an information sharing platform where users may exchange samples and analyses. The Empowered Genome Community (EGC) is a new program in which QIAGEN is making this on-line tool freely available to any individual who wishes to analyze their own genetic sequence. EGC members are then able to make their data available to other Ingenuity Variant Analysis users to be used in research. Here we present and describe the Empowered Genome Community in detail. We also present a preliminary, proof-of-concept study that utilizes the 200 genomes currently available through the EGC. The goal of this program is to allow individuals to access and understand their own data as well as facilitate citizen-scientist collaborations that can drive research forward and spur quality scientific dialogue in the general public.

  3. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  4. BpWrapper: BioPerl-based sequence and tree utilities for rapid prototyping of bioinformatics pipelines.

    PubMed

    Hernández, Yözen; Bernstein, Rocky; Pagan, Pedro; Vargas, Levy; McCaig, William; Ramrattan, Girish; Akther, Saymon; Larracuente, Amanda; Di, Lia; Vieira, Filipe G; Qiu, Wei-Gang

    2018-03-02

    Automated bioinformatics workflows are more robust, easier to maintain, and results more reproducible when built with command-line utilities than with custom-coded scripts. Command-line utilities further benefit by relieving bioinformatics developers to learn the use of, or to interact directly with, biological software libraries. There is however a lack of command-line utilities that leverage popular Open Source biological software toolkits such as BioPerl ( http://bioperl.org ) to make many of the well-designed, robust, and routinely used biological classes available for a wider base of end users. Designed as standard utilities for UNIX-family operating systems, BpWrapper makes functionality of some of the most popular BioPerl modules readily accessible on the command line to novice as well as to experienced bioinformatics practitioners. The initial release of BpWrapper includes four utilities with concise command-line user interfaces, bioseq, bioaln, biotree, and biopop, specialized for manipulation of molecular sequences, sequence alignments, phylogenetic trees, and DNA polymorphisms, respectively. Over a hundred methods are currently available as command-line options and new methods are easily incorporated. Performance of BpWrapper utilities lags that of precompiled utilities while equivalent to that of other utilities based on BioPerl. BpWrapper has been tested on BioPerl Release 1.6, Perl versions 5.10.1 to 5.25.10, and operating systems including Apple macOS, Microsoft Windows, and GNU/Linux. Release code is available from the Comprehensive Perl Archive Network (CPAN) at https://metacpan.org/pod/Bio::BPWrapper . Source code is available on GitHub at https://github.com/bioperl/p5-bpwrapper . BpWrapper improves on existing sequence utilities by following the design principles of Unix text utilities such including a concise user interface, extensive command-line options, and standard input/output for serialized operations. Further, dozens of novel methods for manipulation of sequences, alignments, and phylogenetic trees, unavailable in existing utilities (e.g., EMBOSS, Newick Utilities, and FAST), are provided. Bioinformaticians should find BpWrapper useful for rapid prototyping of workflows on the command-line without creating custom scripts for comparative genomics and other bioinformatics applications.

  5. Distant plant homologues: don't throw out the baby.

    PubMed

    Gardiner, John; Overall, Robyn; Marc, Jan

    2012-03-01

    Plants and metazoans share many similarities in terms of conserved proteins. Antibodies have been used extensively to detect remote homologues, many of which are yet to be identified conclusively. Genome sequencing and the creation of novel sequence or structure comparison programs have assisted greatly in the identification of distant protein homologues. The continuing development of new software algorithms and the combining of bioinformatics with proteomics offer hope that remaining homologues will be soon identified. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. CSB: a Python framework for structural bioinformatics.

    PubMed

    Kalev, Ivan; Mechelke, Martin; Kopec, Klaus O; Holder, Thomas; Carstens, Simeon; Habeck, Michael

    2012-11-15

    Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project's website http://csb.codeplex.com. ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de

  7. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase.

    PubMed

    Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong

    2005-01-01

    Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.

  8. Passenger mutations and aberrant gene expression in congenic tissue plasminogen activator-deficient mouse strains.

    PubMed

    Szabo, R; Samson, A L; Lawrence, D A; Medcalf, R L; Bugge, T H

    2016-08-01

    Essentials C57BL/6J-tissue plasminogen activator (tPA)-deficient mice are widely used to study tPA function. Congenic C57BL/6J-tPA-deficient mice harbor large 129-derived chromosomal segments. The 129-derived chromosomal segments contain gene mutations that may confound data interpretation. Passenger mutation-free isogenic tPA-deficient mice were generated for study of tPA function. Background The ability to generate defined null mutations in mice revolutionized the analysis of gene function in mammals. However, gene-deficient mice generated by using 129-derived embryonic stem cells may carry large segments of 129 DNA, even when extensively backcrossed to reference strains, such as C57BL/6J, and this may confound interpretation of experiments performed in these mice. Tissue plasminogen activator (tPA), encoded by the PLAT gene, is a fibrinolytic serine protease that is widely expressed in the brain. A number of neurological abnormalities have been reported in tPA-deficient mice. Objectives To study genetic contamination of tPA-deficient mice. Materials and methods Whole genome expression array analysis, RNAseq expression profiling, low- and high-density single nucleotide polymorphism (SNP) analysis, bioinformatics and genome editing were used to analyze gene expression in tPA-deficient mouse brains. Results and conclusions Genes differentially expressed in the brain of Plat(-/-) mice from two independent colonies highly backcrossed onto the C57BL/6J strain clustered near Plat on chromosome 8. SNP analysis attributed this anomaly to about 20 Mbp of DNA flanking Plat being of 129 origin in both strains. Bioinformatic analysis of these 129-derived chromosomal segments identified a significant number of mutations in genes co-segregating with the targeted Plat allele, including several potential null mutations. Using zinc finger nuclease technology, we generated novel 'passenger mutation'-free isogenic C57BL/6J-Plat(-/-) and FVB/NJ-Plat(-/-) mouse strains by introducing an 11 bp deletion into the exon encoding the signal peptide. These novel mouse strains will be a useful community resource for further exploration of tPA function in physiological and pathological processes. © 2016 International Society on Thrombosis and Haemostasis.

  9. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  10. Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis

    PubMed Central

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-01-01

    Motivation: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. Results: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of ‘real-world’ bioinformatics tools. Availability: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics. Contact: robert@techfak.uni-bielefeld.de Supplementary information: Supplementary data are available at Bioinformatics online PMID:23355290

  11. Broad issues to consider for library involvement in bioinformatics*

    PubMed Central

    Geer, Renata C.

    2006-01-01

    Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662

  12. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  13. SoS Notebook: An Interactive Multi-Language Data Analysis Environment.

    PubMed

    Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N

    2018-05-22

    Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.

  14. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  15. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  16. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Delta: a new web-based 3D genome visualization and analysis platform.

    PubMed

    Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua

    2018-04-15

    Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.

  18. PetriScape - A plugin for discrete Petri net simulations in Cytoscape.

    PubMed

    Almeida, Diogo; Azevedo, Vasco; Silva, Artur; Baumbach, Jan

    2016-06-04

    Systems biology plays a central role for biological network analysis in the post-genomic era. Cytoscape is the standard bioinformatics tool offering the community an extensible platform for computational analysis of the emerging cellular network together with experimental omics data sets. However, only few apps/plugins/tools are available for simulating network dynamics in Cytoscape 3. Many approaches of varying complexity exist but none of them have been integrated into Cytoscape as app/plugin yet. Here, we introduce PetriScape, the first Petri net simulator for Cytoscape. Although discrete Petri nets are quite simplistic models, they are capable of modeling global network properties and simulating their behaviour. In addition, they are easily understood and well visualizable. PetriScape comes with the following main functionalities: (1) import of biological networks in SBML format, (2) conversion into a Petri net, (3) visualization as Petri net, and (4) simulation and visualization of the token flow in Cytoscape. PetriScape is the first Cytoscape plugin for Petri nets. It allows a straightforward Petri net model creation, simulation and visualization with Cytoscape, providing clues about the activity of key components in biological networks.

  19. PetriScape - A plugin for discrete Petri net simulations in Cytoscape.

    PubMed

    Almeida, Diogo; Azevedo, Vasco; Silva, Artur; Baumbach, Jan

    2016-03-01

    Systems biology plays a central role for biological network analysis in the post-genomic era. Cytoscape is the standard bioinformatics tool offering the community an extensible platform for computational analysis of the emerging cellular network together with experimental omics data sets. However, only few apps/plugins/tools are available for simulating network dynamics in Cytoscape 3. Many approaches of varying complexity exist but none of them have been integrated into Cytoscape as app/plugin yet. Here, we introduce PetriScape, the first Petri net simulator for Cytoscape. Although discrete Petri nets are quite simplistic models, they are capable of modeling global network properties and simulating their behaviour. In addition, they are easily understood and well visualizable. PetriScape comes with the following main functionalities: (1) import of biological networks in SBML format, (2) conversion into a Petri net, (3) visualization as Petri net, and (4) simulation and visualization of the token flow in Cytoscape. PetriScape is the first Cytoscape plugin for Petri nets. It allows a straightforward Petri net model creation, simulation and visualization with Cytoscape, providing clues about the activity of key components in biological networks.

  20. Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules.

    PubMed

    Curtis, Ross E; Kim, Seyoung; Woolford, John L; Xu, Wenjie; Xing, Eric P

    2013-03-21

    Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.

  1. IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis.

    PubMed

    Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A

    2015-06-15

    The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. ppevzner@ucsd.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. HUNT: launch of a full-length cDNA database from the Helix Research Institute.

    PubMed

    Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y

    2001-01-01

    The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.

  3. Use of mutational pattern in 5'-NCR and VP1 regions of polioviruses for molecular diagnosis.

    PubMed

    Pliaka, V; Dedepsidis, E; Kyriakopoulou, Z; Papadopoulou, I; Levidiotou, S; Markoulatos, P

    2007-08-01

    Polioviruses are members of the enterovirus genus, belonging to the Picornaviridae family. They are the causative agents of poliomyelitis, a paralytic and sometimes fatal disease in humans. The number of poliomyelitis cases caused by wild poliovirus infections has been dramatically reduced by the extensive use of two available vaccines: the inactivated poliovirus vaccine (IPV) and the oral poliovirus vaccine (OPV). Despite the importance of OPV in the reduction of poliomyelitis cases, one of the disadvantages associated with this vaccine is the rare occurrence of vaccine-associated paralytic poliomyelitis (VAPP) in vaccinees or their healthy contacts through the accumulation of mutations and/or recombination in Sabin strains genome. Thirteen clinical isolates originating from healthy vaccinees and VAPP cases were investigated in order to identify genomic modifications in 5' non-coding region (5'-NCR) and VP1 genomic regions. The analysis of samples was conducted by RT-PCR, RFLP, sequencing and bioinformatics analysis. All clinical isolates were characterized as OPV-like viruses. Our results showed that analysis of 5'-NCR and VP1 regions of Poliovirus Sabin strains is important in order to identify mutations that increase the neurovirulence conducting to the eventuality of emergence of VAPP cases.

  4. Extensions of PDZ domains as important structural and functional elements.

    PubMed

    Wang, Conan K; Pan, Lifeng; Chen, Jia; Zhang, Mingjie

    2010-08-01

    'Divide and conquer' has been the guiding strategy for the study of protein structure and function. Proteins are divided into domains with each domain having a canonical structural definition depending on its type. In this review, we push forward with the interesting observation that many domains have regions outside of their canonical definition that affect their structure and function; we call these regions 'extensions'. We focus on the highly abundant PDZ (PSD-95, DLG1 and ZO-1) domain. Using bioinformatics, we find that many PDZ domains have potential extensions and we developed an openly-accessible website to display our results ( http://bcz102.ust.hk/pdzex/ ). We propose, using well-studied PDZ domains as illustrative examples, that the roles of PDZ extensions can be classified into at least four categories: 1) protein dynamics-based modulation of target binding affinity, 2) provision of binding sites for macro-molecular assembly, 3) structural integration of multi-domain modules, and 4) expansion of the target ligand-binding pocket. Our review highlights the potential structural and functional importance of domain extensions, highlighting the significance of looking beyond the canonical boundaries of protein domains in general.

  5. Single Pass Streaming BLAST on FPGAs*†

    PubMed Central

    Herbordt, Martin C.; Model, Josh; Sukhwani, Bharat; Gu, Yongfeng; VanCourt, Tom

    2008-01-01

    Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST. PMID:19081828

  6. In the loop: promoter–enhancer interactions and bioinformatics

    PubMed Central

    Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke

    2016-01-01

    Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731

  7. mORCA: sailing bioinformatics world with mobile devices.

    PubMed

    Díaz-Del-Pino, Sergio; Falgueras, Juan; Perez-Wohlfeil, Esteban; Trelles, Oswaldo

    2018-03-01

    Nearly 10 years have passed since the first mobile apps appeared. Given the fact that bioinformatics is a web-based world and that mobile devices are endowed with web-browsers, it seemed natural that bioinformatics would transit from personal computers to mobile devices but nothing could be further from the truth. The transition demands new paradigms, designs and novel implementations. Throughout an in-depth analysis of requirements of existing bioinformatics applications we designed and deployed an easy-to-use web-based lightweight mobile client. Such client is able to browse, select, compose automatically interface parameters, invoke services and monitor the execution of Web Services using the service's metadata stored in catalogs or repositories. mORCA is available at http://bitlab-es.com/morca/app as a web-app. It is also available in the App store by Apple and Play Store by Google. The software will be available for at least 2 years. ortrelles@uma.es. Source code, final web-app, training material and documentation is available at http://bitlab-es.com/morca. © The Author(s) 2017. Published by Oxford University Press.

  8. A primer to frequent itemset mining for bioinformatics

    PubMed Central

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart

    2015-01-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  9. p3d--Python module for structural bioinformatics.

    PubMed

    Fufezan, Christian; Specht, Michael

    2009-08-21

    High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  10. Best practices in bioinformatics training for life scientists.

    PubMed

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  11. Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

    PubMed

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-03-01

    Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.

  12. Best practices in bioinformatics training for life scientists

    PubMed Central

    Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.

    2013-01-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301

  13. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  14. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  15. Serum proteome profiling in canine idiopathic dilated cardiomyopathy using TMT-based quantitative proteomics approach.

    PubMed

    Bilić, Petra; Guillemin, Nicolas; Kovačević, Alan; Beer Ljubić, Blanka; Jović, Ines; Galan, Asier; Eckersall, Peter David; Burchmore, Richard; Mrljak, Vladimir

    2018-05-15

    Idiopathic dilated cardiomyopathy (iDCM) is a primary myocardial disorder with an unknown aetiology, characterized by reduced contractility and ventricular dilation of the left or both ventricles. Naturally occurring canine iDCM was used herein to identify serum proteomic signature of the disease compared to the healthy state, providing an insight into underlying mechanisms and revealing proteins with biomarker potential. To achieve this, we used high-throughput label-based quantitative LC-MS/MS proteomics approach and bioinformatics analysis of the in silico inferred interactome protein network created from the initial list of differential proteins. To complement the proteomic analysis, serum biochemical parameters and levels of know biomarkers of cardiac function were measured. Several proteins with biomarker potential were identified, such as inter-alpha-trypsin inhibitor heavy chain H4, microfibril-associated glycoprotein 4 and apolipoprotein A-IV, which were validated using an independent method (Western blotting) and showed high specificity and sensitivity according to the receiver operating characteristic curve analysis. Bioinformatics analysis revealed involvement of different pathways in iDCM, such as complement cascade activation, lipoprotein particles dynamics, elastic fibre formation, GPCR signalling and respiratory electron transport chain. Idiopathic dilated cardiomyopathy is a severe primary myocardial disease of unknown cause, affecting both humans and dogs. This study is a contribution to the canine heart disease research by means of proteomic and bioinformatic state of the art analyses, following similar approach in human iDCM research. Importantly, we used serum as non-invasive and easily accessible biological source of information and contributed to the scarce data on biofluid proteome research on this topic. Bioinformatics analysis revealed biological pathways modulated in canine iDCM with potential of further targeted research. Also, several proteins with biomarker potential have been identified and successfully validated. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

    PubMed

    Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-07-01

    Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.

  17. A bioinformatics approach for precision medicine off-label drug drug selection among triple negative breast cancer patients

    PubMed Central

    Cheng, Lijun; Schneider, Bryan P

    2016-01-01

    Background Cancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. Objective The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. Methods In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. Results By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. Conclusions The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics. PMID:27107440

  18. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  19. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  20. CoVaCS: a consensus variant calling system.

    PubMed

    Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

    2018-02-05

    The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .

  1. Illuminate Knowledge Elements in Geoscience Literature

    NASA Astrophysics Data System (ADS)

    Ma, X.; Zheng, J. G.; Wang, H.; Fox, P. A.

    2015-12-01

    There are numerous dark data hidden in geoscience literature. Efficient retrieval and reuse of those data will greatly benefit geoscience researches of nowadays. Among the works of data rescue, a topic of interest is illuminating the knowledge framework, i.e. entities and relationships, embedded in documents. Entity recognition and linking have received extensive attention in news and social media analysis, as well as in bioinformatics. In the domain of geoscience, however, such works are limited. We will present our work on how to use knowledge bases on the Web, such as ontologies and vocabularies, to facilitate entity recognition and linking in geoscience literature. The work deploys an un-supervised collective inference approach [1] to link entity mentions in unstructured texts to a knowledge base, which leverages the meaningful information and structures in ontologies and vocabularies for similarity computation and entity ranking. Our work is still in the initial stage towards the detection of knowledge frameworks in literature, and we have been collecting geoscience ontologies and vocabularies in order to build a comprehensive geoscience knowledge base [2]. We hope the work will initiate new ideas and collaborations on dark data rescue, as well as on the synthesis of data and knowledge from geoscience literature. References: 1. Zheng, J., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D.L., Hendler, J., and Ji, H. 2014. Entity linking for biomedical literature. In Proceedings of ACM 8th International Workshop on Data and Text Mining in Bioinformatics, Shanghai, China. 2. Ma, X. Zheng, J., 2015. Linking geoscience entity mentions to the Web of Data. ESIP 2015 Summer Meeting, Pacific Grove, CA.

  2. Transcriptome Analysis of Human Immune Responses Following Live Vaccine Strain (LVS) Francisella Tularensis Vaccination

    DTIC Science & Technology

    2007-03-08

    with CD3D 50848 PAR1/UBE3A Prader–Willi syndrome chromosome region 1, GMCSFRalpha precursor, IL3Ralpha precursor (CD123) Brain development...intervention programs justifiable? Emerg. Infect. Dis. 3, 83–94. iebel, U., Kindler , B., Pepperkok, R., 2004. ‘Harvester’: a fast meta search engine of human...protein resources. Bioinformatics 20, 1962–1963. iebel, U., Kindler , B., Pepperkok, R., 2005. Bioinformatic “Harvester”: a search engine for genome

  3. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  4. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  5. Reactome diagram viewer: data structures and strategies to boost performance.

    PubMed

    Fabregat, Antonio; Sidiropoulos, Konstantinos; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-04-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partitioning data structure minimizes CPU workload, enabling the introduction of new features that further enhance user experience. Through the use of highly optimized data structures and algorithms, Reactome has boosted the performance and usability of the new pathway diagram viewer, providing a robust, scalable and easy-to-integrate solution to pathway visualization. As graph-based visualization of complex data is a frequent challenge in bioinformatics, many of the individual strategies presented here are applicable to a wide range of web-based bioinformatics resources. Reactome is available online at: https://reactome.org. The diagram viewer is part of the Reactome pathway browser (https://reactome.org/PathwayBrowser/) and also available as a stand-alone widget at: https://reactome.org/dev/diagram/. The source code is freely available at: https://github.com/reactome-pwp/diagram. fabregat@ebi.ac.uk or hhe@ebi.ac.uk. Supplementary data are available at Bioinformatics online.

  6. Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station.

    PubMed

    Rettig, Trisha A; Ward, Claire; Pecaut, Michael J; Chapes, Stephen K

    2017-07-01

    Spaceflight is known to affect immune cell populations. In particular, splenic B cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after space flight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. This included assessments of our bioinformatic workflow on Illumina HiSeq and MiSeq datasets and is specifically designed to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options. We validated our workflow by comparing our normal mouse MiSeq data to existing murine antibody repertoire studies validating it for future antibody repertoire studies.

  7. Scalable computing for evolutionary genomics.

    PubMed

    Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

    2012-01-01

    Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

  8. A bioinformatic survey of RNA-binding proteins in Plasmodium.

    PubMed

    Reddy, B P Niranjan; Shrestha, Sony; Hart, Kevin J; Liang, Xiaoying; Kemirembe, Karen; Cui, Liwang; Lindner, Scott E

    2015-11-02

    The malaria parasites in the genus Plasmodium have a very complicated life cycle involving an invertebrate vector and a vertebrate host. RNA-binding proteins (RBPs) are critical factors involved in every aspect of the development of these parasites. However, very few RBPs have been functionally characterized to date in the human parasite Plasmodium falciparum. Using different bioinformatic methods and tools we searched P. falciparum genome to list and annotate RBPs. A representative 3D models for each of the RBD domain identified in P. falciparum was created using I-TESSAR and SWISS-MODEL. Microarray and RNAseq data analysis pertaining PfRBPs was performed using MeV software. Finally, Cytoscape was used to create protein-protein interaction network for CITH-Dozi and Caf1-CCR4-Not complexes. We report the identification of 189 putative RBP genes belonging to 13 different families in Plasmodium, which comprise 3.5% of all annotated genes. Almost 90% (169/189) of these genes belong to six prominent RBP classes, namely RNA recognition motifs, DEAD/H-box RNA helicases, K homology, Zinc finger, Puf and Alba gene families. Interestingly, almost all of the identified RNA-binding helicases and KH genes have cognate homologs in model species, suggesting their evolutionary conservation. Exploration of the existing P. falciparum blood-stage transcriptomes revealed that most RBPs have peak mRNA expression levels early during the intraerythrocytic development cycle, which taper off in later stages. Nearly 27% of RBPs have elevated expression in gametocytes, while 47 and 24% have elevated mRNA expression in ookinete and asexual stages. Comparative interactome analyses using human and Plasmodium protein-protein interaction datasets suggest extensive conservation of the PfCITH/PfDOZI and PfCaf1-CCR4-NOT complexes. The Plasmodium parasites possess a large number of putative RBPs belonging to most of RBP families identified so far, suggesting the presence of extensive post-transcriptional regulation in these parasites. Taken together, in silico identification of these putative RBPs provides a foundation for future functional studies aimed at defining a unique network of post-transcriptional regulation in P. falciparum.

  9. GProX, a user-friendly platform for bioinformatics analysis and visualization of quantitative proteomics data.

    PubMed

    Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy

    2011-08-01

    Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.

  10. GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data*

    PubMed Central

    Rigbolt, Kristoffer T. G.; Vanselow, Jens T.; Blagoev, Blagoy

    2011-01-01

    Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)1. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510

  11. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.

  12. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  13. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform, Version 1.5 and 1.x.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick; Lo, Chien-Chi; Li, Po-E

    EDGE bioinformatics was developed to help biologists process Next Generation Sequencing data (in the form of raw FASTQ files), even if they have little to no bioinformatics expertise. EDGE is a highly integrated and interactive web-based platform that is capable of running many of the standard analyses that biologists require for viral, bacterial/archaeal, and metagenomic samples. EDGE provides the following analytical workflows: quality trimming and host removal, assembly and annotation, comparisons against known references, taxonomy classification of reads and contigs, whole genome SNP-based phylogenetic analysis, and PCR analysis. EDGE provides an intuitive web-based interface for user input, allows users tomore » visualize and interact with selected results (e.g. JBrowse genome browser), and generates a final detailed PDF report. Results in the form of tables, text files, graphic files, and PDFs can be downloaded. A user management system allows tracking of an individual’s EDGE runs, along with the ability to share, post publicly, delete, or archive their results.« less

  14. Using the QCM Biosensor-Based T7 Phage Display Combined with Bioinformatics Analysis for Target Identification of Bioactive Small Molecule.

    PubMed

    Takakusagi, Yoichi; Takakusagi, Kaori; Sugawara, Fumio; Sakaguchi, Kengo

    2018-01-01

    Identification of target proteins that directly bind to bioactive small molecule is of great interest in terms of clarifying the mode of action of the small molecule as well as elucidating the biological phenomena at the molecular level. Of the experimental technologies available, T7 phage display allows comprehensive screening of small molecule-recognizing amino acid sequence from the peptide libraries displayed on the T7 phage capsid. Here, we describe the T7 phage display strategy that is combined with quartz-crystal microbalance (QCM) biosensor for affinity selection platform and bioinformatics analysis for small molecule-recognizing short peptides. This method dramatically enhances efficacy and throughput of the screening for small molecule-recognizing amino acid sequences without repeated rounds of selection. Subsequent execution of bioinformatics programs allows combinatorial and comprehensive target protein discovery of small molecules with its binding site, regardless of protein sample insolubility, instability, or inaccessibility of the fixed small molecules to internally located binding site on larger target proteins when conventional proteomics approaches are used.

  15. Bioinformatics Education—Perspectives and Challenges out of Africa

    PubMed Central

    Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.

    2015-01-01

    The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350

  16. The structural bioinformatics library: modeling in biomolecular science and beyond.

    PubMed

    Cazals, Frédéric; Dreyfus, Tom

    2017-04-01

    Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. The SBL is available from http://sbl.inria.fr. Frederic.Cazals@inria.fr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  17. PRINCESS: Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS.

    PubMed

    Chen, Feng; Wang, Shuang; Jiang, Xiaoqian; Ding, Sijie; Lu, Yao; Kim, Jihoon; Sahinalp, S Cenk; Shimizu, Chisato; Burns, Jane C; Wright, Victoria J; Png, Eileen; Hibberd, Martin L; Lloyd, David D; Yang, Hai; Telenti, Amalio; Bloss, Cinnamon S; Fox, Dov; Lauter, Kristin; Ohno-Machado, Lucila

    2017-03-15

    We introduce PRINCESS, a privacy-preserving international collaboration framework for analyzing rare disease genetic data that are distributed across different continents. PRINCESS leverages Software Guard Extensions (SGX) and hardware for trustworthy computation. Unlike a traditional international collaboration model, where individual-level patient DNA are physically centralized at a single site, PRINCESS performs a secure and distributed computation over encrypted data, fulfilling institutional policies and regulations for protected health information. To demonstrate PRINCESS' performance and feasibility, we conducted a family-based allelic association study for Kawasaki Disease, with data hosted in three different continents. The experimental results show that PRINCESS provides secure and accurate analyses much faster than alternative solutions, such as homomorphic encryption and garbled circuits (over 40 000× faster). https://github.com/achenfengb/PRINCESS_opensource. shw070@ucsd.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. Composable languages for bioinformatics: the NYoSh experiment

    PubMed Central

    Simi, Manuele

    2014-01-01

    Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org. PMID:24482760

  19. Composable languages for bioinformatics: the NYoSh experiment.

    PubMed

    Simi, Manuele; Campagne, Fabien

    2014-01-01

    Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.

  20. Unix Survival Guide.

    PubMed

    Stein, Lincoln D

    2015-09-03

    Most bioinformatics software has been designed to run on Linux and other Unix-like systems. Unix is different from most desktop operating systems because it makes extensive use of a text-only command-line interface. It can be a challenge to become familiar with the command line, but once a person becomes used to it, there are significant rewards, such as the ability to string a commonly used series of commands together with a script. This appendix will get you started with the command line and other Unix essentials. Copyright © 2015 John Wiley & Sons, Inc.

  1. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  2. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  3. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics.

    PubMed

    Dutheil, Julien; Gaillard, Sylvain; Bazin, Eric; Glémin, Sylvain; Ranwez, Vincent; Galtier, Nicolas; Belkhir, Khalid

    2006-04-04

    A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/output methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.

  4. Biowep: a workflow enactment portal for bioinformatics applications.

    PubMed

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-03-08

    The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.

  5. Biowep: a workflow enactment portal for bioinformatics applications

    PubMed Central

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-01-01

    Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563

  6. How to use and integrate bioinformatics tools to compare proteomic data from distinct conditions? A tutorial using the pathological similarities between Aortic Valve Stenosis and Coronary Artery Disease as a case-study.

    PubMed

    Trindade, Fábio; Ferreira, Rita; Magalhães, Beatriz; Leite-Moreira, Adelino; Falcão-Pires, Inês; Vitorino, Rui

    2018-01-16

    Nowadays we are surrounded by a plethora of bioinformatics tools, powerful enough to deal with the large amounts of data arising from proteomic studies, but whose application is sometimes hard to find. Therefore, we used a specific clinical problem - to discriminate pathophysiology and potential biomarkers between two similar cardiovascular diseases, aortic valve stenosis (AVS) and coronary artery disease (CAD) - to make a step-by-step guide through four bioinformatics tools: STRING, DisGeNET, Cytoscape and ClueGO. Proteome data was collected from articles available on PubMed centered on proteomic studies enrolling subjects with AVS or CAD. Through the analysis of gene ontology provided by STRING and ClueGO we could find specific biological phenomena associated with AVS, such as down-regulation of elastic fiber assembly, and with CAD, such as up-regulation of plasminogen activation. Moreover, through Cytoscape and DisGeNET we could pinpoint surrogate markers either for AVS (e.g. popeye domain containing protein 2 and 28S ribosomal protein S36, mitochondrial) or for CAD (e.g. ankyrin repeat and SOCS box protein 7) which deserve future validation. Data recycling and integration as well as research orientation are among the main advantages of resorting to bioinformatics analysis, hence these tutorials can be of great convenience for proteomics investigators. As we saw for aortic valve stenosis and coronary artery disease, it can be of great relevance to perform preliminary bioinformatics analysis with already published proteomics data. It not only saves us time in the lab (avoiding work duplication) as it points out new hypothesis to explain the phenotypical presentation of the diseases as well as new surrogate markers with clinical relevance, deserving future scrutiny. These essential steps can be easily overcome if one follows the steps proposed in our tutorial for STRING, DisGeNET, Cytoscape and ClueGO utilization. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Charged groups at binding interfaces of the PsbO subunit of photosystem II: A combined bioinformatics and simulation study.

    PubMed

    Del Val, Coral; Bondar, Ana-Nicoleta

    2017-06-01

    PsbO is an extrinsic subunit of photosystem II engaged in complex binding interactions within photosystem II. At the interface between PsbO, D1 and D2 subunits of photosystem II, a cluster of charged and polar groups of PsbO is part of an extended hydrogen-bond network thought to participate in proton transfer. The precise role of specific amino acid residues at this complex binding interface remains a key open question. Here, we address this question by carrying out extensive bioinformatics analyses and molecular dynamics simulations of PsbO proteins with mutations at the binding interface. We find that PsbO proteins from cyanobacteria vs. plants have specific preferences for the number and composition of charged amino acid residues that may ensure that PsbO proteins avoid aggregation and expose long unstructured loops for binding to photosystem II. A cluster of conserved charged groups with dynamic hydrogen bonds provides PsbO with structural plasticity at the binding interface with photosystem II. Copyright © 2017. Published by Elsevier B.V.

  8. KISS for STRAP: user extensions for a protein alignment editor.

    PubMed

    Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius

    2003-12-12

    The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/

  9. From trace evidence to bioinformatics: putting bryophytes into molecular biology education.

    PubMed

    Fuselier, Linda; Bougary, Azhar; Malott, Michelle

    2011-01-01

    Students benefit most from their science education when they participate fully in the process of science in the context of real-world problems. We describe a student-directed open-inquiry lab experience that has no predetermined outcomes and requires students to engage in all components of scientific inquiry from posing a question through evaluating and reporting results. Over 5 weeks, students learn how bryophytes are used in forensics and become proficient in important molecular biology lab skills including DNA isolation, polymerase chain reaction, gel electrophoresis, capillary electrophoresis, and genotyping. For this portion of the experience, there is no specialized equipment necessary outside of gel electrophoresis supplies and a thermocycler. In an optional extension of the experience, students sequence a plastid intron and use introductory bioinformatics skills to identify species related to their forensics case. Students who participated in the lab experience performed well on content-based assessment, and student attitudes toward the experience were positive and indicative of engaged learning. The lab experience is easily modified for higher or lower level courses and can be used in secondary education. Copyright © 2011 Wiley Periodicals, Inc.

  10. Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages

    NASA Astrophysics Data System (ADS)

    Djordjevic, Marko

    2005-11-01

    Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely resembles temperate phages, such as lambda. It, however, encodes its own T7-like RNA polymerase (characteristic of virulent phages), whose role in gene expression was unclear. Our analysis resulted in quantitative understanding of the role of both host and phage RNA polymerase, and in the identification of the previously unknown promoter sequence for Xp10 RNA polymerase. More generally, an increasing number of phage genomes are being sequenced every year, and we expect that methods of quantitative data analysis that we introduced will provide an efficient way to study gene expression strategies of novel bacterial viruses.

  11. Scientists' sense making when hypothesizing about disease mechanisms from expression data and their needs for visualization support.

    PubMed

    Mirel, Barbara; Görg, Carsten

    2014-04-26

    A common class of biomedical analysis is to explore expression data from high throughput experiments for the purpose of uncovering functional relationships that can lead to a hypothesis about mechanisms of a disease. We call this analysis expression driven, -omics hypothesizing. In it, scientists use interactive data visualizations and read deeply in the research literature. Little is known, however, about the actual flow of reasoning and behaviors (sense making) that scientists enact in this analysis, end-to-end. Understanding this flow is important because if bioinformatics tools are to be truly useful they must support it. Sense making models of visual analytics in other domains have been developed and used to inform the design of useful and usable tools. We believe they would be helpful in bioinformatics. To characterize the sense making involved in expression-driven, -omics hypothesizing, we conducted an in-depth observational study of one scientist as she engaged in this analysis over six months. From findings, we abstracted a preliminary sense making model. Here we describe its stages and suggest guidelines for developing visualization tools that we derived from this case. A single case cannot be generalized. But we offer our findings, sense making model and case-based tool guidelines as a first step toward increasing interest and further research in the bioinformatics field on scientists' analytical workflows and their implications for tool design.

  12. Zebra: a web server for bioinformatic analysis of diverse protein families.

    PubMed

    Suplatov, Dmitry; Kirilin, Evgeny; Takhaveev, Vakil; Svedas, Vytas

    2014-01-01

    During evolution of proteins from a common ancestor, one functional property can be preserved while others can vary leading to functional diversity. A systematic study of the corresponding adaptive mutations provides a key to one of the most challenging problems of modern structural biology - understanding the impact of amino acid substitutions on protein function. The subfamily-specific positions (SSPs) are conserved within functional subfamilies but are different between them and, therefore, seem to be responsible for functional diversity in protein superfamilies. Consequently, a corresponding method to perform the bioinformatic analysis of sequence and structural data has to be implemented in the common laboratory practice to study the structure-function relationship in proteins and develop novel protein engineering strategies. This paper describes Zebra web server - a powerful remote platform that implements a novel bioinformatic analysis algorithm to study diverse protein families. It is the first application that provides specificity determinants at different levels of functional classification, therefore addressing complex functional diversity of large superfamilies. Statistical analysis is implemented to automatically select a set of highly significant SSPs to be used as hotspots for directed evolution or rational design experiments and analyzed studying the structure-function relationship. Zebra results are provided in two ways - (1) as a single all-in-one parsable text file and (2) as PyMol sessions with structural representation of SSPs. Zebra web server is available at http://biokinet.belozersky.msu.ru/zebra .

  13. Scientists’ sense making when hypothesizing about disease mechanisms from expression data and their needs for visualization support

    PubMed Central

    2014-01-01

    A common class of biomedical analysis is to explore expression data from high throughput experiments for the purpose of uncovering functional relationships that can lead to a hypothesis about mechanisms of a disease. We call this analysis expression driven, -omics hypothesizing. In it, scientists use interactive data visualizations and read deeply in the research literature. Little is known, however, about the actual flow of reasoning and behaviors (sense making) that scientists enact in this analysis, end-to-end. Understanding this flow is important because if bioinformatics tools are to be truly useful they must support it. Sense making models of visual analytics in other domains have been developed and used to inform the design of useful and usable tools. We believe they would be helpful in bioinformatics. To characterize the sense making involved in expression-driven, -omics hypothesizing, we conducted an in-depth observational study of one scientist as she engaged in this analysis over six months. From findings, we abstracted a preliminary sense making model. Here we describe its stages and suggest guidelines for developing visualization tools that we derived from this case. A single case cannot be generalized. But we offer our findings, sense making model and case-based tool guidelines as a first step toward increasing interest and further research in the bioinformatics field on scientists’ analytical workflows and their implications for tool design. PMID:24766796

  14. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology⋆

    PubMed Central

    Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao

    2009-01-01

    Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650

  15. Improving data workflow systems with cloud services and use of open data for bioinformatics research.

    PubMed

    Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich

    2017-04-16

    Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.

  16. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  17. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  18. Integration of QTL and bioinformatic tools to identify candidate genes for triglycerides in mice[S

    PubMed Central

    Leduc, Magalie S.; Hageman, Rachael S.; Verdugo, Ricardo A.; Tsaih, Shirng-Wern; Walsh, Kenneth; Churchill, Gary A.; Paigen, Beverly

    2011-01-01

    To identify genetic loci influencing lipid levels, we performed quantitative trait loci (QTL) analysis between inbred mouse strains MRL/MpJ and SM/J, measuring triglyceride levels at 8 weeks of age in F2 mice fed a chow diet. We identified one significant QTL on chromosome (Chr) 15 and three suggestive QTL on Chrs 2, 7, and 17. We also carried out microarray analysis on the livers of parental strains of 282 F2 mice and used these data to find cis-regulated expression QTL. We then narrowed the list of candidate genes under significant QTL using a “toolbox” of bioinformatic resources, including haplotype analysis; parental strain comparison for gene expression differences and nonsynonymous coding single nucleotide polymorphisms (SNP); cis-regulated eQTL in livers of F2 mice; correlation between gene expression and phenotype; and conditioning of expression on the phenotype. We suggest Slc25a7 as a candidate gene for the Chr 7 QTL and, based on expression differences, five genes (Polr3 h, Cyp2d22, Cyp2d26, Tspo, and Ttll12) as candidate genes for Chr 15 QTL. This study shows how bioinformatics can be used effectively to reduce candidate gene lists for QTL related to complex traits. PMID:21622629

  19. Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

    PubMed

    Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu

    2016-05-01

    High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.

  20. A bioinformatics prediction approach towards analyzing the glycosylation, co-expression and interaction patterns of epithelial membrane antigen (EMA/MUC1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kalra, Rajkumar S., E-mail: renu-wadhwa@aist.go.jp; Wadhwa, Renu, E-mail: renu-wadhwa@aist.go.jp

    2015-02-27

    Epithelial membrane antigen (EMA or MUC1) is a heavily glycosylated, type I transmembrane glycoprotein commonly expressed by epithelial cells of duct organs. It has been shown to be aberrantly glycosylated in several diseases including cancer. Protein sequence based annotation and analysis of glycosylation profile of glycoproteins by robust computational and comprehensive algorithms provides possible insights to the mechanism(s) of anomalous glycosylation. In present report, by using a number of bioinformatics applications we studied EMA/MUC1 and explored its trans-membrane structural domain sequence that is widely subjected to glycosylation. Exploration of different extracellular motifs led to prediction of N and O-linked glycosylationmore » target sites. Based on the putative O-linked target sites, glycosylated moieties and pathways were envisaged. Furthermore, Protein network analysis demonstrated physical interaction of EMA with a number of proteins and confirmed its functional involvement in cell growth and proliferation pathways. Gene Ontology analysis suggested an involvement of EMA in a number of functions including signal transduction, protein binding, processing and transport along with glycosylation. Thus, present study explored potential of bioinformatics prediction approach in analyzing glycosylation, co-expression and interaction patterns of EMA/MUC1 glycoprotein.« less

  1. Serial analysis of gene expression in a rat lung model of asthma.

    PubMed

    Yin, Lei-Miao; Jiang, Gong-Hao; Wang, Yu; Wang, Yan; Liu, Yan-Yan; Jin, Wei-Rong; Zhang, Zen; Xu, Yu-Dong; Yang, Yong-Qing

    2008-11-01

    The pathogenesis and molecular mechanism underlying asthma remain undetermined. The purpose of this study was to identify genes and pathways involved in the early airway response (EAR) phase of asthma by using serial analysis of gene expression (SAGE). Two SAGE tag libraries of lung tissues derived from a rat model of asthma and controls were generated. Bioinformatic analyses were carried out using the Database for Annotation, Visualization and IntegratedDiscovery Functional Annotation Tool, Gene Ontology (GO) TreeMachine and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A total of 26 552 SAGE tags of asthmatic rat lung were obtained, of which 12 221 were unique tags. Of the unique tags, 55.5% were matched with known genes. By comparison of the two libraries, 186 differentially expressed tags (P < 0.05) were identified, of which 103 were upregulated and 83 were downregulated. Using the bioinformatic tools these genes were classified into 23 functional groups, 15 KEGG pathways and 37 enriched GO categories. The bioinformatic analyses of gene distribution, enriched categories and the involvement of specific pathways in the SAGE libraries have provided information on regulatory networks of the EAR phase of asthma. Analyses of the regulated genes of interest may inform new hypotheses, increase our understanding of the disease and provide a foundation for future research.

  2. Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

    PubMed

    Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2015-03-01

    Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  3. Analysis of mass spectrometry data from the secretome of an explant model of articular cartilage exposed to pro-inflammatory and anti-inflammatory stimuli using machine learning

    PubMed Central

    2013-01-01

    Background Osteoarthritis (OA) is an inflammatory disease of synovial joints involving the loss and degeneration of articular cartilage. The gold standard for evaluating cartilage loss in OA is the measurement of joint space width on standard radiographs. However, in most cases the diagnosis is made well after the onset of the disease, when the symptoms are well established. Identification of early biomarkers of OA can facilitate earlier diagnosis, improve disease monitoring and predict responses to therapeutic interventions. Methods This study describes the bioinformatic analysis of data generated from high throughput proteomics for identification of potential biomarkers of OA. The mass spectrometry data was generated using a canine explant model of articular cartilage treated with the pro-inflammatory cytokine interleukin 1 β (IL-1β). The bioinformatics analysis involved the application of machine learning and network analysis to the proteomic mass spectrometry data. A rule based machine learning technique, BioHEL, was used to create a model that classified the samples into their relevant treatment groups by identifying those proteins that separated samples into their respective groups. The proteins identified were considered to be potential biomarkers. Protein networks were also generated; from these networks, proteins pivotal to the classification were identified. Results BioHEL correctly classified eighteen out of twenty-three samples, giving a classification accuracy of 78.3% for the dataset. The dataset included the four classes of control, IL-1β, carprofen, and IL-1β and carprofen together. This exceeded the other machine learners that were used for a comparison, on the same dataset, with the exception of another rule-based method, JRip, which performed equally well. The proteins that were most frequently used in rules generated by BioHEL were found to include a number of relevant proteins including matrix metalloproteinase 3, interleukin 8 and matrix gla protein. Conclusions Using this protocol, combining an in vitro model of OA with bioinformatics analysis, a number of relevant extracellular matrix proteins were identified, thereby supporting the application of these bioinformatics tools for analysis of proteomic data from in vitro models of cartilage degradation. PMID:24330474

  4. Faster sequence homology searches by clustering subsequences.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2015-04-15

    Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  5. The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications

    PubMed Central

    Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S

    2007-01-01

    Background Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. Results The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. Conclusion The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools. PMID:18021453

  6. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

  7. Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

    PubMed

    Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa

    2012-05-04

    Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

  8. The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications.

    PubMed

    Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S

    2007-11-19

    Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.

  9. A review of bioinformatics training applied to research in molecular medicine, agriculture and biodiversity in Costa Rica and Central America.

    PubMed

    Orozco, Allan; Morera, Jessica; Jiménez, Sergio; Boza, Ricardo

    2013-09-01

    Today, Bioinformatics has become a scientific discipline with great relevance for the Molecular Biosciences and for the Omics sciences in general. Although developed countries have progressed with large strides in Bioinformatics education and research, in other regions, such as Central America, the advances have occurred in a gradual way and with little support from the Academia, either at the undergraduate or graduate level. To address this problem, the University of Costa Rica's Medical School, a regional leader in Bioinformatics in Central America, has been conducting a series of Bioinformatics workshops, seminars and courses, leading to the creation of the region's first Bioinformatics Master's Degree. The recent creation of the Central American Bioinformatics Network (BioCANET), associated to the deployment of a supporting computational infrastructure (HPC Cluster) devoted to provide computing support for Molecular Biology in the region, is providing a foundational stone for the development of Bioinformatics in the area. Central American bioinformaticians have participated in the creation of as well as co-founded the Iberoamerican Bioinformatics Society (SOIBIO). In this article, we review the most recent activities in education and research in Bioinformatics from several regional institutions. These activities have resulted in further advances for Molecular Medicine, Agriculture and Biodiversity research in Costa Rica and the rest of the Central American countries. Finally, we provide summary information on the first Central America Bioinformatics International Congress, as well as the creation of the first Bioinformatics company (Indromics Bioinformatics), spin-off the Academy in Central America and the Caribbean.

  10. Bioinformatics and systems biology research update from the 15th International Conference on Bioinformatics (InCoB2016).

    PubMed

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  11. Expanding the horizons of microRNA bioinformatics.

    PubMed

    Huntley, Rachael P; Kramarz, Barbara; Sawford, Tony; Umrao, Zara; Kalea, Anastasia Z; Acquaah, Vanessa; Martin, Maria-Jesus; Mayr, Manuel; Lovering, Ruth C

    2018-06-05

    MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyse microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics datasets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 3,900 Gene Ontology annotations associated with almost 500 miRNAs from human, mouse and rat and over 2,200 experimentally validated miRNA:target interactions. We illustrate how this resource can be used to create miRNA-focused interaction networks with a biological context using the known biological role of miRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent datasets for reproducible functional analysis of microRNAs across all biological research areas. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  12. Better bioinformatics through usability analysis.

    PubMed

    Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia

    2009-02-01

    Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.

  13. Using EMBL-EBI services via Web interface and programmatically via Web Services

    PubMed Central

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2015-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. PMID:25501941

  14. Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.

    PubMed

    Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying

    2017-01-01

    Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms

    PubMed Central

    Kramer, Frank; Bayerlová, Michaela; Beißbarth, Tim

    2014-01-01

    Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. PMID:24833336

  16. Robust enzyme design: bioinformatic tools for improved protein stability.

    PubMed

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program

    PubMed Central

    Rein, Diane C.

    2006-01-01

    Setting: Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. Program Components: After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. Evaluation Mechanisms: The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Next Steps/Future Directions: Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making. PMID:16888666

  18. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program.

    PubMed

    Rein, Diane C

    2006-07-01

    Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making.

  19. Harnessing pain heterogeneity and RNA transcriptome to identify blood–based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model

    PubMed Central

    Grace, Peter M.; Hurley, Daniel; Barratt, Daniel T.; Tsykin, Anna; Watkins, Linda R.; Rolan, Paul E.; Hutchinson, Mark R.

    2017-01-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. PMID:22697386

  20. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.

    PubMed

    Marco-Ramell, Anna; Palau-Rodriguez, Magali; Alay, Ania; Tulipani, Sara; Urpi-Sarda, Mireia; Sanchez-Pla, Alex; Andres-Lacueva, Cristina

    2018-01-02

    Bioinformatic tools for the enrichment of 'omics' datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined. An exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard's distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites. We have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses. Despite the variability of the tools, they provided consistent results independent of their analytic approach. However, more work on the completeness of metabolite and pathway databases is required, which strongly affects the accuracy of enrichment analyses. Improvements will be translated into more accurate and global insights of the metabolome.

  2. Bioinformatics analysis identifies several intrinsically disordered human E3 ubiquitin-protein ligases.

    PubMed

    Boomsma, Wouter; Nielsen, Sofie V; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus; Ellgaard, Lars

    2016-01-01

    The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.

  3. Cloning, characterisation and comparative analysis of a starch synthase IV gene in wheat: functional and evolutionary implications

    PubMed Central

    Leterrier, Marina; Holappa, Lynn D; Broglie, Karen E; Beckles, Diane M

    2008-01-01

    Background Starch is of great importance to humans as a food and biomaterial, and the amount and structure of starch made in plants is determined in part by starch synthase (SS) activity. Five SS isoforms, SSI, II, III, IV and Granule Bound SSI, have been identified, each with a unique catalytic role in starch synthesis. The basic mode of action of SSs is known; however our knowledge of several aspects of SS enzymology at the structural and mechanistic level is incomplete. To gain a better understanding of the differences in SS sequences that underscore their specificity, the previously uncharacterised SSIVb from wheat was cloned and extensive bioinformatics analyses of this and other SSs sequences were done. Results The wheat SSIV cDNA is most similar to rice SSIVb with which it shows synteny and shares a similar exon-intron arrangement. The wheat SSIVb gene was preferentially expressed in leaf and was not regulated by a circadian clock. Phylogenetic analysis showed that in plants, SSIV is closely related to SSIII, while SSI, SSII and Granule Bound SSI clustered together and distinctions between the two groups can be made at the genetic level and included chromosomal location and intron conservation. Further, identified differences at the amino acid level in their glycosyltransferase domains, predicted secondary structures, global conformations and conserved residues might be indicative of intragroup functional associations. Conclusion Based on bioinformatics analysis of the catalytic region of 36 SSs and 3 glycogen synthases (GSs), it is suggested that the valine residue in the highly conserved K-X-G-G-L motif in SSIII and SSIV may be a determining feature of primer specificity of these SSs as compared to GBSSI, SSI and SSII. In GBSSI, the Ile485 residue may partially explain that enzyme's unique catalytic features. The flexible 380s Loop in the starch catalytic domain may be important in defining the specificity of action for each different SS and the G-X-G in motif VI could define SSIV and SSIII action particularly. PMID:18826586

  4. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.

    PubMed

    Wightman, Bruce; Hark, Amy T

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.

  5. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologiesmore » in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.« less

  6. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    PubMed

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

  7. Global analysis of lysine acetylation in strawberry leaves.

    PubMed

    Fang, Xianping; Chen, Wenyue; Zhao, Yun; Ruan, Songlin; Zhang, Hengmu; Yan, Chengqi; Jin, Liang; Cao, Lingling; Zhu, Jun; Ma, Huasheng; Cheng, Zhongyi

    2015-01-01

    Protein lysine acetylation is a reversible and dynamic post-translational modification. It plays an important role in regulating diverse cellular processes including chromatin dynamic, metabolic pathways, and transcription in both prokaryotes and eukaryotes. Although studies of lysine acetylome in plants have been reported, the throughput was not high enough, hindering the deep understanding of lysine acetylation in plant physiology and pathology. In this study, taking advantages of anti-acetyllysine-based enrichment and high-sensitive-mass spectrometer, we applied an integrated proteomic approach to comprehensively investigate lysine acetylome in strawberry. In total, we identified 1392 acetylation sites in 684 proteins, representing the largest dataset of acetylome in plants to date. To reveal the functional impacts of lysine acetylation in strawberry, intensive bioinformatic analysis was performed. The results significantly expanded our current understanding of plant acetylome and demonstrated that lysine acetylation is involved in multiple cellular metabolism and cellular processes. More interestingly, nearly 50% of all acetylated proteins identified in this work were localized in chloroplast and the vital role of lysine acetylation in photosynthesis was also revealed. Taken together, this study not only established the most extensive lysine acetylome in plants to date, but also systematically suggests the significant and unique roles of lysine acetylation in plants.

  8. Proteomic Analysis of Pachytene Spermatocytes of Sterile Hybrid Male Mice.

    PubMed

    Wang, Lu; Guo, Yueshuai; Liu, Wenjing; Zhao, Weidong; Song, Gendi; Zhou, Tao; Huang, Hefeng; Guo, Xuejiang; Sun, Fei

    2016-09-01

    Incompatibilities in interspecific hybrids, such as reduced hybrid fertility and lethality, are common features resulting from reproductive isolation that lead to speciation. Subspecies crosses of house mice produce offspring in which one sex is infertile or absent, yet the molecular mechanisms of hybrid sterility are poorly understood. In this study, we observed extensive asynapsis of chromosomes and disturbance of the sex body in pachytene spermatocytes of sterile F1 males (PWK/Ph female × C57BL/6J male). We report the high-confidence identification of 4005 proteins in the pachytene spermatocytes of fertile F1 males (PWK/Ph male × C57BL/6J female) and sterile F1 males (PWK/Ph female × C57BL/6J male), of which 215 were upregulated and 381 were downregulated. Bioinformatics analysis of the proteome led to the identification of 43 and 59 proteins known to be essential for male meiosis and spermatogenesis in mice, respectively. Characterization of the proteome of pachytene spermatocytes associated with hybrid male sterility provides an inventory of proteins that is useful for understanding meiosis and the mechanisms of hybrid male infertility. © 2016 by the Society for the Study of Reproduction, Inc.

  9. Curation of microarray oligonucleotides and corresponding ESTs/cDNAs used for gene expression analysis in zebra finches.

    PubMed

    Lovell, Peter V; Huizinga, Nicole A; Getachew, Abel; Mees, Brianna; Friedrich, Samantha R; Wirthlin, Morgan; Mello, Claudio V

    2018-05-18

    Zebra finches are a major model organism for investigating mechanisms of vocal learning, a trait that enables spoken language in humans. The development of cDNA collections with expressed sequence tags (ESTs) and microarrays has allowed for extensive molecular characterizations of circuitry underlying vocal learning and production. However, poor database curation can lead to errors in transcriptome and bioinformatics analyses, limiting the impact of these resources. Here we used genomic alignments and synteny analysis for orthology verification to curate and reannotate ~ 35% of the oligonucleotides and corresponding ESTs/cDNAs that make-up Agilent microarrays for gene expression analysis in finches. We found that: (1) 5475 out of 43,084 oligos (a) failed to align to the zebra finch genome, (b) aligned to multiple loci, or (c) aligned to Chr_un only, and thus need to be flagged until a better genome assembly is available, or (d) reflect cloning artifacts; (2) Out of 9635 valid oligos examined further, 3120 were incorrectly named, including 1533 with no known orthologs; and (3) 2635 oligos required name update. The resulting curated dataset provides a reference for correcting gene identification errors in previous finch microarrays studies, and avoiding such errors in future studies.

  10. libRoadRunner: a high performance SBML simulation and analysis library

    PubMed Central

    Somogyi, Endre T.; Bouteiller, Jean-Marie; Glazier, James A.; König, Matthias; Medley, J. Kyle; Swat, Maciej H.; Sauro, Herbert M.

    2015-01-01

    Motivation: This article presents libRoadRunner, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using Systems Biology Markup Language (SBML). SBML is the most widely used standard for representing dynamic networks, especially biochemical networks. libRoadRunner is fast enough to support large-scale problems such as tissue models, studies that require large numbers of repeated runs and interactive simulations. Results: libRoadRunner is a self-contained library, able to run both as a component inside other tools via its C++ and C bindings, and interactively through its Python interface. Its Python Application Programming Interface (API) is similar to the APIs of MATLAB (www.mathworks.com) and SciPy (http://www.scipy.org/), making it fast and easy to learn. libRoadRunner uses a custom Just-In-Time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a variety of processors, making it appropriate for solving extremely large models or repeated runs. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) including several SBML extensions (composition and distributions). It offers multiple deterministic and stochastic integrators, as well as tools for steady-state analysis, stability analysis and structural analysis of the stoichiometric matrix. Availability and implementation: libRoadRunner binary distributions are available for Mac OS X, Linux and Windows. The library is licensed under Apache License Version 2.0. libRoadRunner is also available for ARM-based computers such as the Raspberry Pi. http://www.libroadrunner.org provides online documentation, full build instructions, binaries and a git source repository. Contacts: hsauro@u.washington.edu or somogyie@indiana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26085503

  11. Enabling a systems biology knowledgebase with gaggle and firegoose

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baliga, Nitin S.

    The overall goal of this project was to extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. This technology would enable researchers to incorporate workflows for data analysis that can be executed from this interface to other online applications. The four specific aims were to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is anmore » open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle. In the last phase of this funding period, we have made substantial progress on development and application of the Gaggle integration framework. We implemented the workspace to the Network Portal. Users can capture data from Firegoose and save them to the workspace. Users can create workflows to start multiple software components programmatically and pass data between them. Results of analysis can be saved to the cloud so that they can be easily restored on any machine. We also developed the Gaggle Chrome Goose, a plugin for the Google Chrome browser in tandem with an opencpu server in the Amazon EC2 cloud. This allows users to interactively perform data analysis on a single web page using the R packages deployed on the opencpu server. The cloud-based framework facilitates collaboration between researchers from multiple organizations. We have made a number of enhancements to the cmonkey2 application to enable and improve the integration within different environments, and we have created a new tools pipeline for generating EGRIN2 models in a largely automated way.« less

  12. libRoadRunner: a high performance SBML simulation and analysis library.

    PubMed

    Somogyi, Endre T; Bouteiller, Jean-Marie; Glazier, James A; König, Matthias; Medley, J Kyle; Swat, Maciej H; Sauro, Herbert M

    2015-10-15

    This article presents libRoadRunner, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using Systems Biology Markup Language (SBML). SBML is the most widely used standard for representing dynamic networks, especially biochemical networks. libRoadRunner is fast enough to support large-scale problems such as tissue models, studies that require large numbers of repeated runs and interactive simulations. libRoadRunner is a self-contained library, able to run both as a component inside other tools via its C++ and C bindings, and interactively through its Python interface. Its Python Application Programming Interface (API) is similar to the APIs of MATLAB ( WWWMATHWORKSCOM: ) and SciPy ( HTTP//WWWSCIPYORG/: ), making it fast and easy to learn. libRoadRunner uses a custom Just-In-Time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a variety of processors, making it appropriate for solving extremely large models or repeated runs. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) including several SBML extensions (composition and distributions). It offers multiple deterministic and stochastic integrators, as well as tools for steady-state analysis, stability analysis and structural analysis of the stoichiometric matrix. libRoadRunner binary distributions are available for Mac OS X, Linux and Windows. The library is licensed under Apache License Version 2.0. libRoadRunner is also available for ARM-based computers such as the Raspberry Pi. http://www.libroadrunner.org provides online documentation, full build instructions, binaries and a git source repository. hsauro@u.washington.edu or somogyie@indiana.edu Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

  13. pymzML--Python module for high-throughput bioinformatics on mass spectrometry data.

    PubMed

    Bald, Till; Barth, Johannes; Niehues, Anna; Specht, Michael; Hippler, Michael; Fufezan, Christian

    2012-04-01

    pymzML is an extension to Python that offers (i) an easy access to mass spectrometry (MS) data that allows the rapid development of tools, (ii) a very fast parser for mzML data, the standard data format in MS and (iii) a set of functions to compare or handle spectra. pymzML requires Python2.6.5+ and is fully compatible with Python3. The module is freely available on http://pymzml.github.com or pypi, is published under LGPL license and requires no additional modules to be installed. christian@fufezan.net.

  14. DifA, a methyl-accepting chemoreceptor protein-like sensory protein, uses a novel signaling mechanism to regulate exopolysaccharide production in Myxococcus xanthus.

    PubMed

    Xu, Qian; Black, Wesley P; Nascimi, Heidi M; Yang, Zhaomin

    2011-02-01

    DifA is a methyl-accepting chemotaxis protein (MCP)-like sensory transducer that regulates exopolysaccharide (EPS) production in Myxococcus xanthus. Here mutational analysis and molecular biology were used to probe the signaling mechanisms of DifA in EPS regulation. We first identified the start codon of DifA experimentally; this identification extended the N terminus of DifA for 45 amino acids (aa) from the previous bioinformatics prediction. This extension helped to address the outstanding question of how DifA receives input signals from type 4 pili without a prominent periplasmic domain. The results suggest that DifA uses its N-terminus extension to sense an upstream signal in EPS regulation. We suggest that the perception of the input signal by DifA is mediated by protein-protein interactions with upstream components. Subsequent signal transmission likely involves transmembrane signaling instead of direct intramolecular interactions between the input and the output modules in the cytoplasm. The basic functional unit of DifA for signal transduction is likely dimeric as mutational alteration of the predicted dimeric interface of DifA significantly affected EPS production. Deletions of 14-aa segments in the C terminus suggest that the newly defined flexible bundle subdomain in MCPs is likely critical for DifA function because shortening of this bundle can lead to constitutively active mutations.

  15. High-Content Microscopy Analysis of Subcellular Structures: Assay Development and Application to Focal Adhesion Quantification.

    PubMed

    Kroll, Torsten; Schmidt, David; Schwanitz, Georg; Ahmad, Mubashir; Hamann, Jana; Schlosser, Corinne; Lin, Yu-Chieh; Böhm, Konrad J; Tuckermann, Jan; Ploubidou, Aspasia

    2016-07-01

    High-content analysis (HCA) converts raw light microscopy images to quantitative data through the automated extraction, multiparametric analysis, and classification of the relevant information content. Combined with automated high-throughput image acquisition, HCA applied to the screening of chemicals or RNAi-reagents is termed high-content screening (HCS). Its power in quantifying cell phenotypes makes HCA applicable also to routine microscopy. However, developing effective HCA and bioinformatic analysis pipelines for acquisition of biologically meaningful data in HCS is challenging. Here, the step-by-step development of an HCA assay protocol and an HCS bioinformatics analysis pipeline are described. The protocol's power is demonstrated by application to focal adhesion (FA) detection, quantitative analysis of multiple FA features, and functional annotation of signaling pathways regulating FA size, using primary data of a published RNAi screen. The assay and the underlying strategy are aimed at researchers performing microscopy-based quantitative analysis of subcellular features, on a small scale or in large HCS experiments. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  16. Functional and bioinformatics analysis of an exopolysaccharide-related gene (epsN) from Lactobacillus kefiranofaciens ZW3.

    PubMed

    Wang, Jingrui; Tang, Wei; Zheng, Yongna; Xing, Zhuqing; Wang, Yanping

    2016-09-01

    A novel lactic acid bacteria strain Lactobacillus kefiranofaciens ZW3 exhibited the characteristics of high production of exopolysaccharide (EPS). The epsN gene, located in the eps gene cluster of this strain, is associated with EPS biosynthesis. Bioinformatics analysis of this gene was performed. The conserved domain analysis showed that the EpsN protein contained MATE-Wzx-like domains. Then the epsN gene was amplified to construct the recombinant expression vector pMG36e-epsN. The results showed that the EPS yields of the recombinants were significantly improved. By determining the yields of EPS and intracellular polysaccharide, it was considered that epsN gene could play its Wzx flippase role in the EPS biosynthesis. This is the first time to prove the effect of EpsN on L. kefiranofaciens EPS biosynthesis and further prove its functional property.

  17. IonGAP: integrative bacterial genome analysis for Ion Torrent sequence data.

    PubMed

    Baez-Ortega, Adrian; Lorenzo-Diaz, Fabian; Hernandez, Mariano; Gonzalez-Vila, Carlos Ignacio; Roda-Garcia, Jose Luis; Colebrook, Marcos; Flores, Carlos

    2015-09-01

    We introduce IonGAP, a publicly available Web platform designed for the analysis of whole bacterial genomes using Ion Torrent sequence data. Besides assembly, it integrates a variety of comparative genomics, annotation and bacterial classification routines, based on the widely used FASTQ, BAM and SRA file formats. Benchmarking with different datasets evidenced that IonGAP is a fast, powerful and simple-to-use bioinformatics tool. By releasing this platform, we aim to translate low-cost bacterial genome analysis for microbiological prevention and control in healthcare, agroalimentary and pharmaceutical industry applications. IonGAP is hosted by the ITER's Teide-HPC supercomputer and is freely available on the Web for non-commercial use at http://iongap.hpc.iter.es. mcolesan@ull.edu.es or cflores@ull.edu.es Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Bioinformatics analysis of the phytoene synthase gene in cabbage (Brassica oleracea var. capitata)

    NASA Astrophysics Data System (ADS)

    Sun, Bo; Jiang, Min; Xue, Shengling; Zheng, Aihong; Zhang, Fen; Tang, Haoru

    2018-04-01

    Phytoene Synthase (PSY) is an important enzyme in carotenoid biosynthesis. Here, the Brassica oleracea var. capitata PSY (BocPSY) gene sequences were obtained from Brassica database (BRAD), and preformed for bioinformatics analysis. The BocPSY1, BocPSY2 and BocPSY3 genes mapped to chromosomes 2,3 and 9, and contains an open reading frame of 1,248 bp, 1,266 bp and 1,275 bp that encodes a 415, 421, 424 amino acid protein, respectively. Subcellular localization predicted all BocPSY genes were in the chloroplast. The conserved domain of the BocPSY protein is PLN02632. Homology analysis indicates that the levels of identity among BocPSYs were all more than 85%, and the PSY protein is apparently conserved during plant evolution. The findings of the present study provide a molecular basis for the elucidation of PSY gene function in cabbage.

  19. BioRuby: bioinformatics software for the Ruby programming language.

    PubMed

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-10-15

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org

  20. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions.

    PubMed

    Gundersen, Gregory W; Jones, Matthew R; Rouillard, Andrew D; Kou, Yan; Monteiro, Caroline D; Feldmann, Axel S; Hu, Kevin S; Ma'ayan, Avi

    2015-09-15

    Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. avi.maayan@mssm.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers

    PubMed Central

    Brazas, Michelle D.; Ouellette, B. F. Francis

    2016-01-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression. PMID:27281025

  2. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  3. Bioinformatics research in the Asia Pacific: a 2007 update.

    PubMed

    Ranganathan, Shoba; Gribskov, Michael; Tan, Tin Wee

    2008-01-01

    We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27-30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.

  4. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.

    PubMed

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2014-12-12

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. Copyright © 2014 John Wiley & Sons, Inc.

  5. Bioinformatics in proteomics: application, terminology, and pitfalls.

    PubMed

    Wiemer, Jan C; Prokudin, Alexander

    2004-01-01

    Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.

  6. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  7. MLH1 Promoter Methylation and Prediction/Prognosis of Gastric Cancer: A Systematic Review and Meta and Bioinformatic Analysis.

    PubMed

    Shen, Shixuan; Chen, Xiaohui; Li, Hao; Sun, Liping; Yuan, Yuan

    2018-01-01

    Background: The promoter methylation of MLH1 gene and gastric cancer (GC)has been investigated previously. To get a more credible conclusion, we performed a systematic review and meta and bioinformatic analysis to clarify the role of MLH1 methylation in the prediction and prognosis of GC. Methods: Eligible studies were targeted after searching the PubMed, Web of Science, Embase, BIOSIS, CNKI and Wanfang Data to collect the information of MLH1 methylation and GC. The link strength between the two was estimated by odds ratio with its 95% confidence interval. The Newcastle-Ottawa scale was used for quantity assessment . Subgroup and sensitivity analysis were conducted to explore sources of heterogeneity. The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) were employed for bioinformatics analysis on the correlation between MLH1 methylation and GC risk, clinicopathological behavior as well as prognosis. Results: 2365 GC and 1563 controls were included in the meta-analysis. The pooled OR of MLH1 methylation in GC was 4.895 (95% CI: 3.149-7.611, P<0.001), which considerably associated with increased GC risk. No significant difference was found in relation to Lauren classification, tumor invasion, lymph node/distant metastasis and tumor stage in GC. Analysis based on GEO and TCGA showed that high MLH1 methylation enhanced GC risk but might not related with GC clinicopathological features and prognosis. Conclusion: MLH1 methylation is an alive biomarker for the prediction of GC and it might not affect GC behavior. Further study could be conducted to verify the impact of MLH1 methylation on GC prognosis.

  8. MLH1 Promoter Methylation and Prediction/Prognosis of Gastric Cancer: A Systematic Review and Meta and Bioinformatic Analysis

    PubMed Central

    Shen, Shixuan; Chen, Xiaohui; Li, Hao; Sun, Liping; Yuan, Yuan

    2018-01-01

    Background: The promoter methylation of MLH1 gene and gastric cancer (GC)has been investigated previously. To get a more credible conclusion, we performed a systematic review and meta and bioinformatic analysis to clarify the role of MLH1 methylation in the prediction and prognosis of GC. Methods: Eligible studies were targeted after searching the PubMed, Web of Science, Embase, BIOSIS, CNKI and Wanfang Data to collect the information of MLH1 methylation and GC. The link strength between the two was estimated by odds ratio with its 95% confidence interval. The Newcastle-Ottawa scale was used for quantity assessment. Subgroup and sensitivity analysis were conducted to explore sources of heterogeneity. The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) were employed for bioinformatics analysis on the correlation between MLH1 methylation and GC risk, clinicopathological behavior as well as prognosis. Results: 2365 GC and 1563 controls were included in the meta-analysis. The pooled OR of MLH1 methylation in GC was 4.895 (95% CI: 3.149-7.611, P<0.001), which considerably associated with increased GC risk. No significant difference was found in relation to Lauren classification, tumor invasion, lymph node/distant metastasis and tumor stage in GC. Analysis based on GEO and TCGA showed that high MLH1 methylation enhanced GC risk but might not related with GC clinicopathological features and prognosis. Conclusion: MLH1 methylation is an alive biomarker for the prediction of GC and it might not affect GC behavior. Further study could be conducted to verify the impact of MLH1 methylation on GC prognosis. PMID:29896277

  9. Bioinformatics approaches to single-cell analysis in developmental biology.

    PubMed

    Yalcin, Dicle; Hakguder, Zeynep M; Otu, Hasan H

    2016-03-01

    Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    PubMed

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.

  11. Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia.

    PubMed

    Watson-Haigh, Nathan S; Shang, Catherine A; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V

    2013-09-01

    The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.

  12. Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia

    PubMed Central

    Watson-Haigh, Nathan S.; Shang, Catherine A.; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V.

    2013-01-01

    The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show–style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources. PMID:23543352

  13. Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE.

    PubMed

    Huang, Xin; Li, Hao-ming

    2009-08-05

    Lovastatin is an effective drug for treatment of hyperlipidemia. This study aimed to clone lovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein. According to the lovastatin synthase gene sequence from genebank, primers were designed to amplify and clone the lovastatin biosynthesis regulatory gene lovE from Aspergillus terrus genomic DNA. Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through internet resources and software like DNAMAN. Target fragment lovE, almost 1500 bp in length, was amplified from Aspergillus terrus genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted. In the lovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-like transcriptional factor.

  14. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts.

    PubMed

    Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N

    2012-09-15

    SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.

  15. Bioinformatic Analysis of Strawberry GSTF12 Gene

    NASA Astrophysics Data System (ADS)

    Wang, Xiran; Jiang, Leiyu; Tang, Haoru

    2018-01-01

    GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.

  16. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  17. Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

    PubMed

    Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

    2018-06-21

    The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.

  18. Emerging strengths in Asia Pacific bioinformatics.

    PubMed

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-12-12

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20-23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.

  19. Emerging strengths in Asia Pacific bioinformatics

    PubMed Central

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-01-01

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008

  20. MeDICi Software Superglue for Data Analysis Pipelines

    ScienceCinema

    Ian Gorton

    2017-12-09

    The Middleware for Data-Intensive Computing (MeDICi) Integration Framework is an integrated middleware platform developed to solve data analysis and processing needs of scientists across many domains. MeDICi is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms, and in use today by PNNL scientists for bioinformatics, power grid failure analysis, and text analysis.

  1. Extending Asia Pacific bioinformatics into new realms in the "-omics" era.

    PubMed

    Ranganathan, Shoba; Eisenhaber, Frank; Tong, Joo Chuan; Tan, Tin Wee

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.

  2. G-DOC Plus - an integrative bioinformatics platform for precision medicine.

    PubMed

    Bhuvaneshwar, Krithika; Belouali, Anas; Singh, Varun; Johnson, Robert M; Song, Lei; Alaoui, Adil; Harris, Michael A; Clarke, Robert; Weiner, Louis M; Gusev, Yuriy; Madhavan, Subha

    2016-04-30

    G-DOC Plus is a data integration and bioinformatics platform that uses cloud computing and other advanced computational tools to handle a variety of biomedical BIG DATA including gene expression arrays, NGS and medical images so that they can be analyzed in the full context of other omics and clinical information. G-DOC Plus currently holds data from over 10,000 patients selected from private and public resources including Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and the recently added datasets from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT), caArray studies of lung and colon cancer, ImmPort and the 1000 genomes data sets. The system allows researchers to explore clinical-omic data one sample at a time, as a cohort of samples; or at the level of population, providing the user with a comprehensive view of the data. G-DOC Plus tools have been leveraged in cancer and non-cancer studies for hypothesis generation and validation; biomarker discovery and multi-omics analysis, to explore somatic mutations and cancer MRI images; as well as for training and graduate education in bioinformatics, data and computational sciences. Several of these use cases are described in this paper to demonstrate its multifaceted usability. G-DOC Plus can be used to support a variety of user groups in multiple domains to enable hypothesis generation for precision medicine research. The long-term vision of G-DOC Plus is to extend this translational bioinformatics platform to stay current with emerging omics technologies and analysis methods to continue supporting novel hypothesis generation, analysis and validation for integrative biomedical research. By integrating several aspects of the disease and exposing various data elements, such as outpatient lab workup, pathology, radiology, current treatments, molecular signatures and expected outcomes over a web interface, G-DOC Plus will continue to strengthen precision medicine research. G-DOC Plus is available at: https://gdoc.georgetown.edu .

  3. Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

    PubMed

    Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

    2018-06-01

    Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.

  4. Development of a Bioinformatics Framework for the Detection of Gene Conversion and the Analysis of Combinatorial Diversity in Immunoglobulin Heavy Chains in Four Cattle Breeds.

    PubMed

    Walther, Stefanie; Tietze, Manfred; Czerny, Claus-Peter; König, Sven; Diesterbeck, Ulrike S

    2016-01-01

    We have developed a new bioinformatics framework for the analysis of rearranged bovine heavy chain immunoglobulin (Ig) variable regions by combining and refining widely used alignment algorithms. This bioinformatics framework allowed us to investigate alignments of heavy chain framework regions (FRHs) and the separate alignments of FRHs and heavy chain complementarity determining regions (CDRHs) to determine their germline origin in the four cattle breeds Aubrac, German Black Pied, German Simmental, and Holstein Friesian. Now it is also possible to specifically analyze Ig heavy chains possessing exceptionally long CDR3Hs. In order to gain more insight into breed specific differences in Ig combinatorial diversity, somatic hypermutations and putative gene conversions of IgG, we compared the dominantly transcribed variable (IGHV), diversity (IGHD), and joining (IGHJ) segments and their recombination in the four cattle breeds. The analysis revealed the use of 15 different IGHV segments, 21 IGHD segments, and two IGHJ segments with significant different transcription levels within the breeds. Furthermore, there are preferred rearrangements within the three groups of CDR3H lengths. In the sequences of group 2 (CDR3H lengths (L) of 11-47 amino acid residues (aa)) a higher number of recombination was observed than in sequences of group 1 (L≤10 aa) and 3 (L≥48 aa). The combinatorial diversity of germline IGHV, IGHD, and IGHJ-segments revealed 162 rearrangements that were significantly different. The few preferably rearranged gene segments within group 3 CDR3H regions may indicate specialized antibodies because this length is unique in cattle. The most important finding of this study, which was enabled by using the bioinformatics framework, is the discovery of strong evidence for gene conversion as a rare event using pseudogenes fulfilling all definitions for this particular diversification mechanism.

  5. Development of a Bioinformatics Framework for the Detection of Gene Conversion and the Analysis of Combinatorial Diversity in Immunoglobulin Heavy Chains in Four Cattle Breeds

    PubMed Central

    Czerny, Claus-Peter; König, Sven; Diesterbeck, Ulrike S.

    2016-01-01

    We have developed a new bioinformatics framework for the analysis of rearranged bovine heavy chain immunoglobulin (Ig) variable regions by combining and refining widely used alignment algorithms. This bioinformatics framework allowed us to investigate alignments of heavy chain framework regions (FRHs) and the separate alignments of FRHs and heavy chain complementarity determining regions (CDRHs) to determine their germline origin in the four cattle breeds Aubrac, German Black Pied, German Simmental, and Holstein Friesian. Now it is also possible to specifically analyze Ig heavy chains possessing exceptionally long CDR3Hs. In order to gain more insight into breed specific differences in Ig combinatorial diversity, somatic hypermutations and putative gene conversions of IgG, we compared the dominantly transcribed variable (IGHV), diversity (IGHD), and joining (IGHJ) segments and their recombination in the four cattle breeds. The analysis revealed the use of 15 different IGHV segments, 21 IGHD segments, and two IGHJ segments with significant different transcription levels within the breeds. Furthermore, there are preferred rearrangements within the three groups of CDR3H lengths. In the sequences of group 2 (CDR3H lengths (L) of 11–47 amino acid residues (aa)) a higher number of recombination was observed than in sequences of group 1 (L≤10 aa) and 3 (L≥48 aa). The combinatorial diversity of germline IGHV, IGHD, and IGHJ-segments revealed 162 rearrangements that were significantly different. The few preferably rearranged gene segments within group 3 CDR3H regions may indicate specialized antibodies because this length is unique in cattle. The most important finding of this study, which was enabled by using the bioinformatics framework, is the discovery of strong evidence for gene conversion as a rare event using pseudogenes fulfilling all definitions for this particular diversification mechanism. PMID:27828971

  6. Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

    PubMed Central

    2013-01-01

    Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020

  7. PanWeb: A web interface for pan-genomic analysis.

    PubMed

    Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J

    2017-01-01

    With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.

  8. SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer.

    PubMed

    Beccuti, Marco; Cordero, Francesca; Arigoni, Maddalena; Panero, Riccardo; Amparore, Elvio G; Donatelli, Susanna; Calogero, Raffaele A

    2018-03-01

    Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analysis also to scientists with/without scripting experience. Docker container images, docker4seq package and the GUI are available at http://www.bioinformatica.unito.it/reproducibile.bioinformatics.html. beccuti@di.unito.it. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  9. A bioinformatics approach for precision medicine off-label drug drug selection among triple negative breast cancer patients.

    PubMed

    Cheng, Lijun; Schneider, Bryan P; Li, Lang

    2016-07-01

    Cancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Characterizing the “POAGome”: A bioinformatics-driven approach to primary open-angle glaucoma

    PubMed Central

    Danford, Ian D.; Verkuil, Lana D.; Choi, Daniel J.; Collins, David W.; Gudiseva, Harini V.; Uyhazi, Katherine E.; Lau, Marisa K.; Kanu, Levi N.; Grant, Gregory R.; Chavali, Venkata R.M.; O’Brien, Joan M.

    2017-01-01

    Primary open-angle glaucoma (POAG) is a genetically, physiologically, and phenotypically complex neurodegenerative disorder. This study addressed the expanding collection of genes associated with POAG, referred to as the “POAGome.” We used bioinformatics tools to perform an extensive, systematic literature search and compiled 542 genes with confirmed associations with POAG and its related phenotypes (normal tension glaucoma, ocular hypertension, juvenile open-angle glaucoma, and primary congenital glaucoma). The genes were classified according to their associated ocular tissues and phenotypes, and functional annotation and pathway analyses were subsequently performed. Our study reveals that no single molecular pathway can encompass the pathophysiology of POAG. The analyses suggested that inflammation and senescence may play pivotal roles in both the development and perpetuation of the retinal ganglion cell degeneration seen in POAG. The TGF-β signaling pathway was repeatedly implicated in our analyses, suggesting that it may be an important contributor to the manifestation of POAG in the anterior and posterior segments of the globe. We propose a molecular model of POAG revolving around TGF-β signaling, which incorporates the roles of inflammation and senescence in this disease. Finally, we highlight emerging molecular therapies that show promise for treating POAG. PMID:28223208

  11. Language workbench user interfaces for data analysis

    PubMed Central

    Benson, Victoria M.

    2015-01-01

    Biological data analysis is frequently performed with command line software. While this practice provides considerable flexibility for computationally savy individuals, such as investigators trained in bioinformatics, this also creates a barrier to the widespread use of data analysis software by investigators trained as biologists and/or clinicians. Workflow systems such as Galaxy and Taverna have been developed to try and provide generic user interfaces that can wrap command line analysis software. These solutions are useful for problems that can be solved with workflows, and that do not require specialized user interfaces. However, some types of analyses can benefit from custom user interfaces. For instance, developing biomarker models from high-throughput data is a type of analysis that can be expressed more succinctly with specialized user interfaces. Here, we show how Language Workbench (LW) technology can be used to model the biomarker development and validation process. We developed a language that models the concepts of Dataset, Endpoint, Feature Selection Method and Classifier. These high-level language concepts map directly to abstractions that analysts who develop biomarker models are familiar with. We found that user interfaces developed in the Meta-Programming System (MPS) LW provide convenient means to configure a biomarker development project, to train models and view the validation statistics. We discuss several advantages of developing user interfaces for data analysis with a LW, including increased interface consistency, portability and extension by language composition. The language developed during this experiment is distributed as an MPS plugin (available at http://campagnelab.org/software/bdval-for-mps/). PMID:25755929

  12. GeneFisher-P: variations of GeneFisher as processes in Bio-jETI

    PubMed Central

    Lamprecht, Anna-Lena; Margaria, Tiziana; Steffen, Bernhard; Sczyrba, Alexander; Hartmeier, Sven; Giegerich, Robert

    2008-01-01

    Background PCR primer design is an everyday, but not trivial task requiring state-of-the-art software. We describe the popular tool GeneFisher and explain its recent restructuring using workflow techniques. We apply a service-oriented approach to model and implement GeneFisher-P, a process-based version of the GeneFisher web application, as a part of the Bio-jETI platform for service modeling and execution. We show how to introduce a flexible process layer to meet the growing demand for improved user-friendliness and flexibility. Results Within Bio-jETI, we model the process using the jABC framework, a mature model-driven, service-oriented process definition platform. We encapsulate remote legacy tools and integrate web services using jETI, an extension of the jABC for seamless integration of remote resources as basic services, ready to be used in the process. Some of the basic services used by GeneFisher are in fact already provided as individual web services at BiBiServ and can be directly accessed. Others are legacy programs, and are made available to Bio-jETI via the jETI technology. The full power of service-based process orientation is required when more bioinformatics tools, available as web services or via jETI, lead to easy extensions or variations of the basic process. This concerns for instance variations of data retrieval or alignment tools as provided by the European Bioinformatics Institute (EBI). Conclusions The resulting service- and process-oriented GeneFisher-P demonstrates how basic services from heterogeneous sources can be easily orchestrated in the Bio-jETI platform and lead to a flexible family of specialized processes tailored to specific tasks. PMID:18460174

  13. The Escherichia coli Peripheral Inner Membrane Proteome*

    PubMed Central

    Papanastasiou, Malvina; Orfanoudaki, Georgia; Koukaki, Marina; Kountourakis, Nikos; Sardis, Marios Frantzeskos; Aivaliotis, Michalis; Karamanou, Spyridoula; Economou, Anastassios

    2013-01-01

    Biological membranes are essential for cell viability. Their functional characteristics strongly depend on their protein content, which consists of transmembrane (integral) and peripherally associated membrane proteins. Both integral and peripheral inner membrane proteins mediate a plethora of biological processes. Whereas transmembrane proteins have characteristic hydrophobic stretches and can be predicted using bioinformatics approaches, peripheral inner membrane proteins are hydrophilic, exist in equilibria with soluble pools, and carry no discernible membrane targeting signals. We experimentally determined the cytoplasmic peripheral inner membrane proteome of the model organism Escherichia coli using a multidisciplinary approach. Initially, we extensively re-annotated the theoretical proteome regarding subcellular localization using literature searches, manual curation, and multi-combinatorial bioinformatics searches of the available databases. Next we used sequential biochemical fractionations coupled to direct identification of individual proteins and protein complexes using high resolution mass spectrometry. We determined that the proposed cytoplasmic peripheral inner membrane proteome occupies a previously unsuspected ∼19% of the basic E. coli BL21(DE3) proteome, and the detected peripheral inner membrane proteome occupies ∼25% of the estimated expressed proteome of this cell grown in LB medium to mid-log phase. This value might increase when fleeting interactions, not studied here, are taken into account. Several proteins previously regarded as exclusively cytoplasmic bind membranes avidly. Many of these proteins are organized in functional or/and structural oligomeric complexes that bind to the membrane with multiple interactions. Identified proteins cover the full spectrum of biological activities, and more than half of them are essential. Our data suggest that the cytoplasmic proteome displays remarkably dynamic and extensive communication with biological membrane surfaces that we are only beginning to decipher. PMID:23230279

  14. Discovering the secondary metabolite potential encoded within Entomopathogenic Fungi

    USDA-ARS?s Scientific Manuscript database

    This article discusses the secondary metabolite potential of the insect pathogens Metarhizium and Beauveria, including a bioinformatics analysis of secondary metabolite genes for which no products are yet identified....

  15. Bioinformatics goes back to the future.

    PubMed

    Miller, Crispin J; Attwood, Teresa K

    2003-02-01

    The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.

  16. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  17. Design and Implementation of an Interdepartmental Bioinformatics Program across Life Science Curricula

    ERIC Educational Resources Information Center

    Miskowski, Jennifer A.; Howard, David R.; Abler, Michael L.; Grunwald, Sandra K.

    2007-01-01

    Over the past 10 years, there has been a technical revolution in the life sciences leading to the emergence of a new discipline called bioinformatics. In response, bioinformatics-related topics have been incorporated into various undergraduate courses along with the development of new courses solely focused on bioinformatics. This report describes…

  18. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  19. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  20. Building international genomics collaboration for global health security

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installationmore » of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries.« less

  1. Building international genomics collaboration for global health security

    DOE PAGES

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.; ...

    2015-12-07

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installationmore » of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries.« less

  2. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  3. Treetrimmer: a method for phylogenetic dataset size reduction.

    PubMed

    Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M

    2013-04-12

    With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

  4. Bioinformatics in protein kinases regulatory network and drug discovery.

    PubMed

    Chen, Qingfeng; Luo, Haiqiong; Zhang, Chengqi; Chen, Yi-Ping Phoebe

    2015-04-01

    Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Role of remote sensing, geographical information system (GIS) and bioinformatics in kala-azar epidemiology

    PubMed Central

    Bhunia, Gouri Sankar; Dikhit, Manas Ranjan; Kesari, Shreekant; Sahoo, Ganesh Chandra; Das, Pradeep

    2011-01-01

    Visceral leishmaniasis or kala-azar is a potent parasitic infection causing death of thousands of people each year. Medicinal compounds currently available for the treatment of kala-azar have serious side effects and decreased efficacy owing to the emergence of resistant strains. The type of immune reaction is also to be considered in patients infected with Leishmania donovani (L. donovani). For complete eradication of this disease, a high level modern research is currently being applied both at the molecular level as well as at the field level. The computational approaches like remote sensing, geographical information system (GIS) and bioinformatics are the key resources for the detection and distribution of vectors, patterns, ecological and environmental factors and genomic and proteomic analysis. Novel approaches like GIS and bioinformatics have been more appropriately utilized in determining the cause of visearal leishmaniasis and in designing strategies for preventing the disease from spreading from one region to another. PMID:23554714

  6. Prospects and limitations of full-text index structures in genome analysis

    PubMed Central

    Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter

    2012-01-01

    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621

  7. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  8. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  9. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  10. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  11. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: new insights from bioinformatics analysis

    PubMed Central

    2012-01-01

    Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the possible biological functions of the rice OsGELP genes. Conclusions Our current genomic analysis, for the first time, presents fundamental information on the organization of the rice OsGELP gene family. With combination of the genomic, phylogenetic, microarray expression, protein motif distribution, and protein structure analyses, we were able to create supported basis for the functional prediction of many members in the rice GDSL esterase/lipase family. The present study provides a platform for the selection of candidate genes for further detailed functional study. PMID:22793791

  12. Bioinformatics prediction and experimental validation of microRNA-20a targeting Cyclin D1 in hepatocellular carcinoma.

    PubMed

    Karimkhanloo, Hamzeh; Mohammadi-Yeganeh, Samira; Ahsani, Zeinab; Paryan, Mahdi

    2017-04-01

    Hepatocellular carcinoma is the major form of primary liver cancer, which is the second and sixth leading cause of cancer-related death in men and women, respectively. Extensive research indicates that Wnt/β-catenin signaling pathway, which plays a pivotal role in growth, development, and differentiation of hepatocellular carcinoma, is one of the major signaling pathways that is dysregulated in hepatocellular carcinoma. Cyclin D1 is a proto-oncogene and is one of the major regulators of Wnt signaling pathway, and its overexpression has been detected in various types of cancers including hepatocellular carcinoma. Using several validated bioinformatic databases, we predicted that the microRNAs are capable of targeting 3'-untranslated region of Cyclin D1 messenger RNA. According to the results, miR-20a was selected as the highest ranking microRNA targeting Cyclin D1 messenger RNA. Luciferase assay was recruited to confirm bioinformatic prediction results. Cyclin D1 expression was first assessed by quantitative real-time polymerase chain reaction in HepG2 cell line. Afterward, HepG2 cells were transduced by lentiviruses containing miR-20a. Then, the expression of miR-20a and Cyclin D1 was evaluated. The results of luciferase assay demonstrated targeting of 3'-untranslated region of Cyclin D1 messenger RNA by miR-20a. Furthermore, 238-fold decline in Cyclin D1 expression was observed after lentiviral induction of miR-20a in HepG2 cells. The results highlighted a considerable effect of miRNA-20a induction on the down-regulation of Cyclin D1 gene. Our results suggest that miR-20a can be used as a novel candidate for therapeutic purposes and a biomarker for hepatocellular carcinoma diagnosis.

  13. Opportunities and challenges provided by cloud repositories for bioinformatics-enabled drug discovery.

    PubMed

    Dalpé, Gratien; Joly, Yann

    2014-09-01

    Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.

  14. Harnessing pain heterogeneity and RNA transcriptome to identify blood-based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model.

    PubMed

    Grace, Peter M; Hurley, Daniel; Barratt, Daniel T; Tsykin, Anna; Watkins, Linda R; Rolan, Paul E; Hutchinson, Mark R

    2012-09-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. © 2012 The Authors. Journal of Neurochemistry © 2012 International Society for Neurochemistry.

  15. Persistent human Borna disease virus infection modifies the acetylome of human oligodendroglia cells towards higher energy and transporter levels

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Xia; Liu, Siwen; Chongqing Key Laboratory of Neurobiology, Chongqing Medical University, Chongqing 400016

    2015-11-15

    Background: Borna disease virus (BDV) is a neurotropic RNA virus persistently infecting mammalian hosts including humans. Lysine acetylation (Kac) is a key protein post-translational modification (PTM). The unexpectedly broad regulatory scope of Kac let us to profile the entire acetylome upon BDV infection. Methods: The acetylome was profiled through stable isotope labeling for cell culture (SILAC)-based quantitative proteomics. The quantifiable proteome was annotated using bioinformatics. Results: We identified and quantified 791 Kac sites in 473 Kac proteins in human BDV Hu-H1-infected and non-infected oligodendroglial (OL) cells. Bioinformatic analysis revealed that BDV infection alters the acetylation of metabolic proteins, membrane-associated proteinsmore » and transmembrane transporter activity, and affects the acetylation of several lysine acetyltransferases (KAT). Conclusions: Upon BDV persistence the OL acetylome is manipulated towards higher energy and transporter levels necessary for shuttling BDV proteins to and from nuclear replication sites. - Highlights: • We used SILAC-based proteomics to analyze the acetylome of BDV infected OL cells. • We quantified 791Kac sites in 473 proteins. • Bioinformatic analysis revealed altered acetylation of metabolic proteins et al. • BDV manipulates the OL acetylome towards higher energy and transporter levels. • BDV infection is associated with enriched phosphate-associated metabolic processes.« less

  16. pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families

    PubMed Central

    Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas

    2014-01-01

    The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248

  17. Application of bioinformatics tools and databases in microbial dehalogenation research (a review).

    PubMed

    Satpathy, R; Konkimalla, V B; Ratha, J

    2015-01-01

    Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.

  18. Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

    PubMed Central

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  19. Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

    PubMed

    Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

    2015-05-01

    Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Integrated Bioinformatics, Environmental Epidemiologic and Genomic Approaches to Identify Environmental and Molecular Links between Endometriosis and Breast Cancer

    PubMed Central

    Roy, Deodutta; Morgan, Marisa; Yoo, Changwon; Deoraj, Alok; Roy, Sandhya; Yadav, Vijay Kumar; Garoub, Mohannad; Assaggaf, Hamza; Doke, Mayur

    2015-01-01

    We present a combined environmental epidemiologic, genomic, and bioinformatics approach to identify: exposure of environmental chemicals with estrogenic activity; epidemiologic association between endocrine disrupting chemical (EDC) and health effects, such as, breast cancer or endometriosis; and gene-EDC interactions and disease associations. Human exposure measurement and modeling confirmed estrogenic activity of three selected class of environmental chemicals, polychlorinated biphenyls (PCBs), bisphenols (BPs), and phthalates. Meta-analysis showed that PCBs exposure, not Bisphenol A (BPA) and phthalates, increased the summary odds ratio for breast cancer and endometriosis. Bioinformatics analysis of gene-EDC interactions and disease associations identified several hundred genes that were altered by exposure to PCBs, phthalate or BPA. EDCs-modified genes in breast neoplasms and endometriosis are part of steroid hormone signaling and inflammation pathways. All three EDCs–PCB 153, phthalates, and BPA influenced five common genes—CYP19A1, EGFR, ESR2, FOS, and IGF1—in breast cancer as well as in endometriosis. These genes are environmentally and estrogen responsive, altered in human breast and uterine tumors and endometriosis lesions, and part of Mitogen Activated Protein Kinase (MAPK) signaling pathways in cancer. Our findings suggest that breast cancer and endometriosis share some common environmental and molecular risk factors. PMID:26512648

  1. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

    PubMed Central

    Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

    2012-01-01

    Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background  The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design  The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results  To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions  The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278

  2. Nano-LC-ESI MS/MS analysis of proteins in dried sea dragon Solenognathus hardwickii and bioinformatic analysis of its protein expression profiling.

    PubMed

    Zhang, Dong-Mei; Feng, Li-Xing; Li, Lu; Liu, Miao; Jiang, Bao-Hong; Yang, Min; Li, Guo-Qiang; Wu, Wan-Ying; Guo, De-An; Liu, Xuan

    2016-09-01

    The sea dragon Solenognathus hardwickii has long been used as a traditional Chinese medicine for the treatment of various diseases, such as male impotency. To gain a comprehensive insight into the protein components of the sea dragon, shotgun proteomic analysis of its protein expression profiling was conducted in the present study. Proteins were extracted from dried sea dragon using a trichloroacetic acid/acetone precipitation method and then separated by SDS-PAGE. The protein bands were cut from the gel and digested by trypsin to generate peptide mixture. The peptide fragments were then analyzed using nano liquid chromatography tandem mass spectrometry (nano-LC-ESI MS/MS). 810 proteins and 1 577 peptides were identified in the dried sea dragon. The identified proteins exhibited molecular weight values ranging from 1 900 to 3 516 900 Da and pI values from 3.8 to 12.18. Bioinformatic analysis was conducted using the DAVID Bioinformatics Resources 6.7 Gene Ontology (GO) analysis tool to explore possible functions of the identified proteins. Ascribed functions of the proteins mainly included intracellular non-membrane-bound organelle, non-membrane-bounded organelle, cytoskeleton, structural molecule activity, calcium ion binding and etc. Furthermore, possible signal networks of the identified proteins were predicted using STRING (Search Tool for the Retrieval of Interacting Genes) database. Ribosomal protein synthesis was found to play an important role in the signal network. The results of this study, to best of our knowledge, were the first to provide a reference proteome profile for the sea dragon, and would aid in the understanding of the expression and functions of the identified proteins. Copyright © 2016 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  3. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

    PubMed

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.

  4. Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource

    ERIC Educational Resources Information Center

    Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc

    2005-01-01

    EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…

  5. The 2017 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest. PMID:29118973

  6. The 2017 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year's theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.

  7. Rising Strengths Hong Kong SAR in Bioinformatics.

    PubMed

    Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy

    2017-06-01

    Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.

  8. Iterative refinement of structure-based sequence alignments by Seed Extension

    PubMed Central

    Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook

    2009-01-01

    Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133

  9. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  10. The International Conference on Intelligent Biology and Medicine (ICIBM) 2016: summary and innovation in genomics.

    PubMed

    Zhao, Zhongming; Liu, Zhandong; Chen, Ken; Guo, Yan; Allen, Genevera I; Zhang, Jiajie; Jim Zheng, W; Ruan, Jianhua

    2017-10-03

    In this editorial, we first summarize the 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) that was held on December 8-10, 2016 in Houston, Texas, USA, and then briefly introduce the ten research articles included in this supplement issue. ICIBM 2016 included four workshops or tutorials, four keynote lectures, four conference invited talks, eight concurrent scientific sessions and a poster session for 53 accepted abstracts, covering current topics in bioinformatics, systems biology, intelligent computing, and biomedical informatics. Through our call for papers, a total of 77 original manuscripts were submitted to ICIBM 2016. After peer review, 11 articles were selected in this special issue, covering topics such as single cell RNA-seq analysis method, genome sequence and variation analysis, bioinformatics method for vaccine development, and cancer genomics.

  11. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    PubMed

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  12. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts

    PubMed Central

    Ryan, Michael C.; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N.

    2012-01-01

    Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. Availability and implementation: SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. Contact: mryan@insilico.us.com Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22820202

  13. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

    PubMed

    Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong

    2013-12-01

    To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis

    PubMed Central

    Kagohara, Luciane T; Stein-O’Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Wick, Heather C; Danilova, Ludmila V; Easwaran, Hariharan; Favorov, Alexander V; Qian, Jiang; Gaykalova, Daria A; Fertig, Elana J

    2018-01-01

    Abstract Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies. PMID:28968850

  15. [Study of a family with epidermolysis bullosa simplex resulting from a novel mutation of KRT14 gene].

    PubMed

    Meng, Lanlan; Du, Juan; Li, Wen; Lu, Guangxiu; Tan, Yueqiu

    2017-08-10

    To determine the molecular etiology for a Chinese pedigree affected with epidermolysis bullosa simplex (EBS). Target region sequencing using a hereditary epidermolysis bullosa capture array combined with Sanger sequencing and bioinformatics analysis were used. Mutation taster, PolyPhen-2, Provean, and SIFT software and NCBI online were employed to assess the pathogenicity and conservation of detected mutations. One hundred healthy unrelated individuals were used as controls. Target region sequencing showed that the proband has carried a unreported heterozygous c.1234A>G (p.Ile412Val) mutation of the KRT14 gene, which was confirmed by Sanger sequencing in other 8 affected individuals but not among healthy members of the pedigree. Bioinformatics analysis indicated that the mutation is highly pathogenic. Remarkably, 3 members of the family (2 affected and 1 unaffected) have carried a heterozygous c.1237G>A (p.Ala413Thr) mutation of the KRT14 gene, which was collected in Human Gene Mutation Database (HGMD). Bioinformatics analysis indicated that the mutation may not be pathogenic. Both mutations were not detected among the 100 healthy controls. The novel c.1234A>G(p.Ile412Val) mutation of the KRT14 gene is probably responsible for the disease, while c.1237G>A (p.Ala413Thr) mutation of KRT14 gene may be a polymorphism. Compared with Sanger sequencing, target region capture sequencing is more efficient and can significantly reduce the cost of genetic testing for EBS.

  16. The growing need for microservices in bioinformatics.

    PubMed

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting.

  17. The growing need for microservices in bioinformatics

    PubMed Central

    Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.

    2016-01-01

    Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting. PMID:27994937

  18. FY09 Final Report for LDRD Project: Understanding Viral Quasispecies Evolution through Computation and Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, C

    2009-11-12

    In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.

  19. A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data

    EPA Science Inventory

    Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may levera...

  20. Evolving from bioinformatics in-the-small to bioinformatics in-the-large.

    PubMed

    Parker, D Stott; Gorlick, Michael M; Lee, Christopher J

    2003-01-01

    We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.

  1. Bioinformatics education dissemination with an evolutionary problem solving perspective.

    PubMed

    Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J

    2010-11-01

    Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.

  2. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  3. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    PubMed

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological conclusions. Our results demonstrate that while there were differences in depth of coverage and phylogenetic diversity, all workflows revealed comparable treatment effects on microbial diversity. To increase reproducibility and reliability and to retain consistency between similar studies, it is important to consider the impact on data quality and relative abundance of taxa when selecting NGS platforms and analysis tools for microbiome studies.

  5. Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

    PubMed Central

    Wood, Louisa; Gebhardt, Philipp

    2013-01-01

    Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266

  6. The 2016 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.

  7. ConKit: a python interface to contact predictions.

    PubMed

    Simkovic, Felix; Thomas, Jens M H; Rigden, Daniel J

    2017-07-15

    Recent advances in protein residue contact prediction algorithms have led to the emergence of many new methods and a variety of file formats. We present ConKit , an open source, modular and extensible Python interface which allows facile conversion between formats and provides an interface to analyses of sequence alignments and sets of contact predictions. ConKit is available via the Python Package Index. The documentation can be found at http://www.conkit.org . ConKit is licensed under the BSD 3-Clause. hlfsimko@liverpool.ac.uk or drigden@liverpool.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  8. Tentacle: distributed quantification of genes in metagenomes.

    PubMed

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

  9. Mechanism of 150-cavity formation in influenza neuraminidase

    PubMed Central

    Amaro, Rommie E.; Swift, Robert V.; Votapka, Lane; Li, Wilfred W.; Walker, Ross C.; Bush, Robin M.

    2011-01-01

    The recently discovered 150-cavity in the active site of group-1 influenza A neuraminidase (NA) proteins provides a target for rational structure-based drug development to counter the increasing frequency of antiviral resistance in influenza. Surprisingly, the 2009 H1N1 pandemic virus (09N1) neuramidase was crystalized without the 150-cavity characteristic of group-1 NAs. Here we demonstrate, through a total sum of 1.6 μs of biophysical simulations, that 09N1 NA exists in solution preferentially with an open 150-cavity. Comparison with simulations using avian N1, human N2 and 09N1 with a I149V mutation and an extensive bioinformatics analysis suggests that the conservation of a key salt bridge is crucial in the stabilization of the 150-cavity across both subtypes. This result provides an atomic-level structural understanding of the recent finding that antiviral compounds designed to take advantage of contacts in the 150-cavity can inactivate both 2009 H1N1 pandemic and avian H5N1 viruses. PMID:21750542

  10. MiR-144-3p regulates osteogenic differentiation and proliferation of murine mesenchymal stem cells by specifically targeting Smad4.

    PubMed

    Huang, Cong; Geng, Junnan; Wei, Xiajie; Zhang, Ruirui; Jiang, Siwen

    2016-03-01

    Despite extensive research on osteoblast differentiation and proliferation in mesenchymal stem cells (MSCs), the accurate mechanism remains to be further elucidated. MicroRNAs have been reported to be key regulators of osteoblast differentiation and proliferation. Here, we found that miR-144-3p is down-regulated during osteoblast differentiation of C3H10T1/2 cells. Overexpression of miR-144-3p inhibited osteogenic differentiation, whereas inhibition of miR-144-3p reversed this process. Furthermore, miR-144-3p inhibited the proliferation of C3H10T1/2 cells by arresting cells at the G0/G1 phase. Results from bioinformatics analysis, luciferase assay and western blotting demonstrated that miR-144-3p directly targeted Smad4. Additionally, Smad4 knockdown blocks the effects of miR-144-3p inhibitor. Therefore, we conclude that miR-144-3p negatively regulates osteogenic differentiation and proliferation of C3H10T1/2 cells by targeting Smad4. © 2016 Federation of European Biochemical Societies.

  11. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    PubMed

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  12. Combinatory annotation of cell membrane receptors and signalling pathways of Bombyx mori prothoracic glands

    PubMed Central

    Moulos, Panagiotis; Samiotaki, Martina; Panayotou, George; Dedos, Skarlatos G.

    2016-01-01

    The cells of prothoracic glands (PG) are the main site of synthesis and secretion of ecdysteroids, the biochemical products of cholesterol conversion to steroids that shape the morphogenic development of insects. Despite the availability of genome sequences from several insect species and the extensive knowledge of certain signalling pathways that underpin ecdysteroidogenesis, the spectrum of signalling molecules and ecdysteroidogenic cascades is still not fully comprehensive. To fill this gap and obtain the complete list of cell membrane receptors expressed in PG cells, we used combinatory bioinformatic, proteomic and transcriptomic analysis and quantitative PCR to annotate and determine the expression profiles of genes identified as putative cell membrane receptors of the model insect species, Bombyx mori, and subsequently enrich the repertoire of signalling pathways that are present in its PG cells. The genome annotation dataset we report here highlights modules and pathways that may be directly involved in ecdysteroidogenesis and aims to disseminate data and assist other researchers in the discovery of the role of such receptors and their ligands. PMID:27576083

  13. Discovery of putative salivary biomarkers for Sjögren's syndrome using high resolution mass spectrometry and bioinformatics.

    PubMed

    Zoukhri, Driss; Rawe, Ian; Singh, Mabi; Brown, Ashley; Kublin, Claire L; Dawson, Kevin; Haddon, William F; White, Earl L; Hanley, Kathleen M; Tusé, Daniel; Malyj, Wasyl; Papas, Athena

    2012-03-01

    The purpose of the current study was to determine if saliva contains biomarkers that can be used as diagnostic tools for Sjögren's syndrome (SjS). Twenty seven SjS patients and 27 age-matched healthy controls were recruited for these studies. Unstimulated glandular saliva was collected from the Wharton's duct using a suction device. Two µl of salvia were processed for mass spectrometry analyses on a prOTOF 2000 matrix-assisted laser desorption/ionization orthogonal time of flight (MALDI O-TOF) mass spectrometer. Raw data were analyzed using bioinformatic tools to identify biomarkers. MALDI O-TOF MS analyses of saliva samples were highly reproducible and the mass spectra generated were very rich in peptides and peptide fragments in the 750-7,500 Da range. Data analysis using bioinformatic tools resulted in several classification models being built and several biomarkers identified. One model based on 7 putative biomarkers yielded a sensitivity of 97.5%, specificity of 97.8% and an accuracy of 97.6%. One biomarker was present only in SjS samples and was identified as a proteolytic peptide originating from human basic salivary proline-rich protein 3 precursor. We conclude that salivary biomarkers detected by high-resolution mass spectrometry coupled with powerful bioinformatic tools offer the potential to serve as diagnostic/prognostic tools for SjS.

  14. Bringing Web 2.0 to bioinformatics.

    PubMed

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  15. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were analyzed in relation to their experience in learning a bioinformatics mini-unit. There were distinct individual differences among the participants, especially in the way they processed information and integrated procedural and analytical thought during bioinformatics learning. These differences may provide insights into some of the specific needs of students that educators and curriculum designers should consider when designing bioinformatics learning experiences. Implications for teacher education and curriculum design are presented in addition to some suggestions for further research.

  16. A bioinformatics transcriptome meta-analysis highlights the importance of trophoblast differentiation in the pathology of hydatidiform moles.

    PubMed

    Desterke, Christophe; Slim, Rima; Candelier, Jean-Jacques

    2018-05-01

    Hydatidiform mole (HM) is an aberrant human pregnancy with abnormal trophoblastic development, migration/invasion of the extravillous trophoblast in the decidua. These abnormalities are established in a hypoxic environment during the first trimester of gestation. Using text mining, we identified 72 unique genes that are linked to HM (HM-linked genes) that we studied by bioinformatic analysis in publicly available transcriptomes of primary chorionic villous cells (cytotrophoblast, syncytiotrophoblast, extravillous trophoblast, and arterial and venous endothelial) isolated from normal placentas or established trophoblastic cell lines cultured under different oxygen concentrations. We show that the majority of HM-linked genes (75%) are involved in normal trophoblastic differentiation, arranged in clusters, and some of them are implicated in chorionic villous invasion or regulated by oxygen concentrations. Our analysis integrates the various aspects of the pathophysiology of HM and highlights the importance of trophoblastic differentiation in this pathology. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Bioinformatics analysis of the ς-carotene desaturase gene in cabbage (Brassica oleracea var. capitata)

    NASA Astrophysics Data System (ADS)

    Sun, Bo; Zheng, Aihong; Jiang, Min; Xue, Shengling; Zhang, Fen; Tang, Haoru

    2018-04-01

    ς-carotene desaturase (ZDS) is an important enzyme in carotenoid biosynthesis. Here, the Brassica oleracea var. capitata ZDS (BocZDS) gene sequences were obtained from Brassica database (BRAD), and preformed for bioinformatics analysis. The BocZDS gene mapped to Scaffold000363, and contains an open reading frame of 1,686 bp that encodes a 561-amino acid protein with a calculated molecular mass of 62.00 kD and an isoelectric point (pI) of 8.2. Subcellular localization predicted the BocZDS gene was in the chloroplast. The conserved domain of the BocZDS protein is PLN02487, indicating that it belongs the member of zeta-carotene desaturase. Homology analysis indicates that the ZDS protein is apparently conserved during plant evolution and is most closely related to B. oleracea var. oleracea, B. napus, and B. rapa. The findings of the present study provide a molecular basis for the elucidation of ZDS gene function in cabbage.

  18. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  19. A bioinformatics potpourri.

    PubMed

    Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba

    2018-01-19

    The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.

  20. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

Top