Sample records for comparative bioinformatics analyses

  1. A review of bioinformatics platforms for comparative genomics. Recent developments of the EDGAR 2.0 platform and its utility for taxonomic and phylogenetic studies.

    PubMed

    Yu, J; Blom, J; Glaeser, S P; Jaenicke, S; Juhre, T; Rupp, O; Schwengers, O; Spänig, S; Goesmann, A

    2017-11-10

    The rapid development of next generation sequencing technology has greatly increased the amount of available microbial genomes. As a result of this development, there is a rising demand for fast and automated approaches in analyzing these genomes in a comparative way. Whole genome sequencing also bears a huge potential for obtaining a higher resolution in phylogenetic and taxonomic classification. During the last decade, several software tools and platforms have been developed in the field of comparative genomics. In this manuscript, we review the most commonly used platforms and approaches for ortholog group analyses with a focus on their potential for phylogenetic and taxonomic research. Furthermore, we describe the latest improvements of the EDGAR platform for comparative genome analyses and present recent examples of its application for the phylogenomic analysis of different taxa. Finally, we illustrate the role of the EDGAR platform as part of the BiGi Center for Microbial Bioinformatics within the German network on Bioinformatics Infrastructure (de.NBI). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  3. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    PubMed Central

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

    2018-01-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625

  4. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

    PubMed

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

    2018-02-27

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  5. Trace Elements and Healthcare: A Bioinformatics Perspective.

    PubMed

    Zhang, Yan

    2017-01-01

    Biological trace elements are essential for human health. Imbalance in trace element metabolism and homeostasis may play an important role in a variety of diseases and disorders. While the majority of previous researches focused on experimental verification of genes involved in trace element metabolism and those encoding trace element-dependent proteins, bioinformatics study on trace elements is relatively rare and still at the starting stage. This chapter offers an overview of recent progress in bioinformatics analyses of trace element utilization, metabolism, and function, especially comparative genomics of several important metals. The relationship between individual elements and several diseases based on recent large-scale systematic studies such as genome-wide association studies and case-control studies is discussed. Lastly, developments of ionomics and its recent application in human health are also introduced.

  6. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  7. Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

    PubMed

    Nair, Pradeep S; John, Eugene B

    2007-01-01

    Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.

  8. Bioinformatics tools for quantitative and functional metagenome and metatranscriptome data analysis in microbes.

    PubMed

    Niu, Sheng-Yong; Yang, Jinyu; McDermaid, Adam; Zhao, Jing; Kang, Yu; Ma, Qin

    2017-05-08

    Metagenomic and metatranscriptomic sequencing approaches are more frequently being used to link microbiota to important diseases and ecological changes. Many analyses have been used to compare the taxonomic and functional profiles of microbiota across habitats or individuals. While a large portion of metagenomic analyses focus on species-level profiling, some studies use strain-level metagenomic analyses to investigate the relationship between specific strains and certain circumstances. Metatranscriptomic analysis provides another important insight into activities of genes by examining gene expression levels of microbiota. Hence, combining metagenomic and metatranscriptomic analyses will help understand the activity or enrichment of a given gene set, such as drug-resistant genes among microbiome samples. Here, we summarize existing bioinformatics tools of metagenomic and metatranscriptomic data analysis, the purpose of which is to assist researchers in deciding the appropriate tools for their microbiome studies. Additionally, we propose an Integrated Meta-Function mapping pipeline to incorporate various reference databases and accelerate functional gene mapping procedures for both metagenomic and metatranscriptomic analyses. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  10. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  11. Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.

    PubMed

    Ortega, Davi R; Zhulin, Igor B

    2018-01-01

    Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.

  12. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Planning bioinformatics workflows using an expert system

    PubMed Central

    Chen, Xiaoling; Chang, Jeffrey T.

    2017-01-01

    Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928

  14. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  15. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  16. Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics

    PubMed Central

    Zhou, Tao; Zhou, Zuo-Min; Guo, Xue-Jiang

    2013-01-01

    Proteomics strategies have been widely used in the field of male reproduction, both in basic and clinical research. Bioinformatics methods are indispensable in proteomics-based studies and are used for data presentation, database construction and functional annotation. In the present review, we focus on the functional annotation of gene lists obtained through qualitative or quantitative methods, summarizing the common and male reproduction specialized proteomics databases. We introduce several integrated tools used to find the hidden biological significance from the data obtained. We further describe in detail the information on male reproduction derived from Gene Ontology analyses, pathway analyses and biomedical analyses. We provide an overview of bioinformatics annotations in spermatogenesis, from gene function to biological function and from biological function to clinical application. On the basis of recently published proteomics studies and associated data, we show that bioinformatics methods help us to discover drug targets for sperm motility and to scan for cancer-testis genes. In addition, we summarize the online resources relevant to male reproduction research for the exploration of the regulation of spermatogenesis. PMID:23852026

  17. SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.

    PubMed

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-03-08

    The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.

  18. SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

    PubMed Central

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-01-01

    Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562

  19. 5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK--Setting the analysis frame.

    PubMed

    Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf

    2005-09-01

    The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.

  20. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  1. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  2. Chondrocyte channel transcriptomics

    PubMed Central

    Lewis, Rebecca; May, Hannah; Mobasheri, Ali; Barrett-Jolley, Richard

    2013-01-01

    To date, a range of ion channels have been identified in chondrocytes using a number of different techniques, predominantly electrophysiological and/or biomolecular; each of these has its advantages and disadvantages. Here we aim to compare and contrast the data available from biophysical and microarray experiments. This letter analyses recent transcriptomics datasets from chondrocytes, accessible from the European Bioinformatics Institute (EBI). We discuss whether such bioinformatic analysis of microarray datasets can potentially accelerate identification and discovery of ion channels in chondrocytes. The ion channels which appear most frequently across these microarray datasets are discussed, along with their possible functions. We discuss whether functional or protein data exist which support the microarray data. A microarray experiment comparing gene expression in osteoarthritis and healthy cartilage is also discussed and we verify the differential expression of 2 of these genes, namely the genes encoding large calcium-activated potassium (BK) and aquaporin channels. PMID:23995703

  3. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  4. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  6. Genetic Markers Analyses and Bioinformatic Approaches to Distinguish Between Olive Tree (Olea europaea L.) Cultivars.

    PubMed

    Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Rebai, Ahmed

    2016-12-01

    The genetic diversity of 22 olive tree cultivars (Olea europaea L.) sampled from different Mediterranean countries was assessed using 5 SNP markers (FAD2.1; FAD2.3; CALC; SOD and ANTHO3) located in four different genes. The genotyping analysis of the 22 cultivars with 5 SNP loci revealed 11 alleles (average 2.2 per allele). The dendrogram based on cultivar genotypes revealed three clusters consistent with the cultivars classification. Besides, the results obtained with the five SNPs were compared to those obtained with the SSR markers using bioinformatic analyses and by computing a cophenetic correlation coefficient, indicating the usefulness of the UPGMA method for clustering plant genotypes. Based on principal coordinate analysis using a similarity matrix, the first two coordinates, revealed 54.94 % of the total variance. This work provides a more comprehensive explanation of the diversity available in Tunisia olive cultivars, and an important contribution for olive breeding and olive oil authenticity.

  7. Partnering for functional genomics research conference: Abstracts of poster presentations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    1998-06-01

    This reports contains abstracts of poster presentations presented at the Functional Genomics Research Conference held April 16--17, 1998 in Oak Ridge, Tennessee. Attention is focused on the following areas: mouse mutagenesis and genomics; phenotype screening; gene expression analysis; DNA analysis technology development; bioinformatics; comparative analyses of mouse, human, and yeast sequences; and pilot projects to evaluate methodologies.

  8. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  9. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  10. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  11. Bioinformatics in translational drug discovery.

    PubMed

    Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G

    2017-08-31

    Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).

  12. Genomic and bioinformatics analyses of HAdV-4vac and HAdV-7vac, two human adenovirus (HAdV) strains that constituted original prophylaxis against HAdV-related acute respiratory disease, a reemerging epidemic disease.

    PubMed

    Purkayastha, Anjan; Su, Jing; McGraw, John; Ditty, Susan E; Hadfield, Ted L; Seto, Jason; Russell, Kevin L; Tibbetts, Clark; Seto, Donald

    2005-07-01

    Vaccine strains of human adenovirus serotypes 4 and 7 (HAdV-4vac and HAdV-7vac) have been used successfully to prevent adenovirus-related acute respiratory disease outbreaks. The genomes of these two vaccine strains have been sequenced, annotated, and compared with their prototype equivalents with the goals of understanding their genomes for molecular diagnostics applications, vaccine redevelopment, and HAdV pathoepidemiology. These reference genomes are archived in GenBank as HAdV-4vac (35,994 bp; AY594254) and HAdV-7vac (35,240 bp; AY594256). Bioinformatics and comparative whole-genome analyses with their recently reported and archived prototype genomes reveal six mismatches and four insertions-deletions (indels) between the HAdV-4 prototype and vaccine strains, in contrast to the 611 mismatches and 130 indels between the HAdV-7 prototype and vaccine strains. Annotation reveals that the HAdV-4vac and HAdV-7vac genomes contain 51 and 50 coding units, respectively. Neither vaccine strain appears to be attenuated for virulence based on bioinformatics analyses. There is evidence of genome recombination, as the inverted terminal repeat of HAdV-4vac is initially identical to that of species C whereas the prototype is identical to species B1. These vaccine reference sequences yield unique genome signatures for molecular diagnostics. As a molecular forensics application, these references identify the circulating and problematic 1950s era field strains as the original HAdV-4 prototype and the Greider prototype, from which the vaccines are derived. Thus, they are useful for genomic comparisons to current epidemic and reemerging field strains, as well as leading to an understanding of pathoepidemiology among the human adenoviruses.

  13. Genomic and Bioinformatics Analyses of HAdV-4vac and HAdV-7vac, Two Human Adenovirus (HAdV) Strains That Constituted Original Prophylaxis against HAdV-Related Acute Respiratory Disease, a Reemerging Epidemic Disease

    PubMed Central

    Purkayastha, Anjan; Su, Jing; McGraw, John; Ditty, Susan E.; Hadfield, Ted L.; Seto, Jason; Russell, Kevin L.; Tibbetts, Clark; Seto, Donald

    2005-01-01

    Vaccine strains of human adenovirus serotypes 4 and 7 (HAdV-4vac and HAdV-7vac) have been used successfully to prevent adenovirus-related acute respiratory disease outbreaks. The genomes of these two vaccine strains have been sequenced, annotated, and compared with their prototype equivalents with the goals of understanding their genomes for molecular diagnostics applications, vaccine redevelopment, and HAdV pathoepidemiology. These reference genomes are archived in GenBank as HAdV-4vac (35,994 bp; AY594254) and HAdV-7vac (35,240 bp; AY594256). Bioinformatics and comparative whole-genome analyses with their recently reported and archived prototype genomes reveal six mismatches and four insertions-deletions (indels) between the HAdV-4 prototype and vaccine strains, in contrast to the 611 mismatches and 130 indels between the HAdV-7 prototype and vaccine strains. Annotation reveals that the HAdV-4vac and HAdV-7vac genomes contain 51 and 50 coding units, respectively. Neither vaccine strain appears to be attenuated for virulence based on bioinformatics analyses. There is evidence of genome recombination, as the inverted terminal repeat of HAdV-4vac is initially identical to that of species C whereas the prototype is identical to species B1. These vaccine reference sequences yield unique genome signatures for molecular diagnostics. As a molecular forensics application, these references identify the circulating and problematic 1950s era field strains as the original HAdV-4 prototype and the Greider prototype, from which the vaccines are derived. Thus, they are useful for genomic comparisons to current epidemic and reemerging field strains, as well as leading to an understanding of pathoepidemiology among the human adenoviruses. PMID:16000418

  14. A review of bioinformatic methods for forensic DNA analyses.

    PubMed

    Liu, Yao-Yuan; Harbison, SallyAnn

    2018-03-01

    Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  16. Advances in Omics and Bioinformatics Tools for Systems Analyses of Plant Functions

    PubMed Central

    Mochida, Keiichi; Shinozaki, Kazuo

    2011-01-01

    Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances. PMID:22156726

  17. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community

    PubMed Central

    D'Elia, Domenica; Gisel, Andreas; Eriksson, Nils-Einar; Kossida, Sophia; Mattila, Kimmo; Klucar, Lubos; Bongcam-Rudloff, Erik

    2009-01-01

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in. PMID:19534734

  18. Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools

    ERIC Educational Resources Information Center

    Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.

    2011-01-01

    Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…

  19. Bioinformatics on the cloud computing platform Azure.

    PubMed

    Shanahan, Hugh P; Owen, Anne M; Harrison, Andrew P

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.

  20. p3d--Python module for structural bioinformatics.

    PubMed

    Fufezan, Christian; Specht, Michael

    2009-08-21

    High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  1. Bioinformatics on the Cloud Computing Platform Azure

    PubMed Central

    Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  2. Using Kepler for Tool Integration in Microarray Analysis Workflows.

    PubMed

    Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

  3. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.

  4. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  5. MaxAlign: maximizing usable data in an alignment.

    PubMed

    Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G

    2007-08-28

    The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.

  6. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  7. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  8. Analyses of Brucella Pathogenesis, Host Immunity, and Vaccine Targets using Systems Biology and Bioinformatics

    PubMed Central

    He, Yongqun

    2011-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning. PMID:22919594

  9. Analyses of Brucella pathogenesis, host immunity, and vaccine targets using systems biology and bioinformatics.

    PubMed

    He, Yongqun

    2012-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning.

  10. Computational biology for ageing

    PubMed Central

    Wieser, Daniela; Papatheodorou, Irene; Ziehm, Matthias; Thornton, Janet M.

    2011-01-01

    High-throughput genomic and proteomic technologies have generated a wealth of publicly available data on ageing. Easy access to these data, and their computational analysis, is of great importance in order to pinpoint the causes and effects of ageing. Here, we provide a description of the existing databases and computational tools on ageing that are available for researchers. We also describe the computational approaches to data interpretation in the field of ageing including gene expression, comparative and pathway analyses, and highlight the challenges for future developments. We review recent biological insights gained from applying bioinformatics methods to analyse and interpret ageing data in different organisms, tissues and conditions. PMID:21115530

  11. G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS.

    PubMed

    Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin

    2015-01-01

    Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%.

  12. G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS

    PubMed Central

    Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin

    2015-01-01

    Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%. PMID:26504488

  13. Bioactive endophytes warrant intensified exploration and conservation.

    PubMed

    Smith, Stephen A; Tank, David C; Boulanger, Lori-Ann; Bascom-Slack, Carol A; Eisenman, Kaury; Kingery, David; Babbs, Beatrice; Fenn, Kathleen; Greene, Joshua S; Hann, Bradley D; Keehner, Jocelyn; Kelley-Swift, Elizabeth G; Kembaiyan, Vivek; Lee, Sun Jin; Li, Puyao; Light, David Y; Lin, Emily H; Ma, Cong; Moore, Emily; Schorn, Michelle A; Vekhter, Daniel; Nunez, Percy V; Strobel, Gary A; Donoghue, Michael J; Strobel, Scott A

    2008-08-25

    A key argument in favor of conserving biodiversity is that as yet undiscovered biodiversity will yield products of great use to humans. However, the link between undiscovered biodiversity and useful products is largely conjectural. Here we provide direct evidence from bioassays of endophytes isolated from tropical plants and bioinformatic analyses that novel biology will indeed yield novel chemistry of potential value. We isolated and cultured 135 endophytic fungi and bacteria from plants collected in Peru. nrDNAs were compared to samples deposited in GenBank to ascertain the genetic novelty of cultured specimens. Ten endophytes were found to be as much as 15-30% different than any sequence in GenBank. Phylogenetic trees, using the most similar sequences in GenBank, were constructed for each endophyte to measure phylogenetic distance. Assays were also conducted on each cultured endophyte to record bioactivity, of which 65 were found to be bioactive. The novelty of our contribution is that we have combined bioinformatic analyses that document the diversity found in environmental samples with culturing and bioassays. These results highlight the hidden hyperdiversity of endophytic fungi and the urgent need to explore and conserve hidden microbial diversity. This study also showcases how undergraduate students can obtain data of great scientific significance.

  14. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  15. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  17. Treetrimmer: a method for phylogenetic dataset size reduction.

    PubMed

    Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M

    2013-04-12

    With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

  18. Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

    PubMed

    Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

    2018-06-21

    The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.

  19. Isolation, Characterization, and Bioinformatic Analyses of Lytic Salmonella Enteritidis Phages and Tests of Their Antibacterial Activity in Food.

    PubMed

    Han, Han; Wei, Xiaoting; Wei, Yi; Zhang, Xiufeng; Li, Xuemin; Jiang, Jinzhong; Wang, Ran

    2017-02-01

    Salmonella Enteritidis remains a major threat for food safety. To take efforts to develop phage-based biocontrol for S. Enteritidis contamination in food, in this study, the phages against S. Enteritidis were isolated from sewage samples, characterized by host range assays, DNA restriction enzyme pattern analyses, and transmission electron microscope observations, and tested for antibacterial activity in food; some potent phages were further characterized by bioinformatic analyses. Results showed that based on the plaque quality and host range, seven lytic phages targeting S. Enteritidis were selected, considered as seven distinct phages through DNA physical maps, and classified as Myoviridae or Siphoviridae family by morphologic observations; the combined use of such seven strain phages as a "food additive" could succeed in controlling the artificial S. Enteritidis contamination in the different physical forms of food at a range of temperatures; by bioinformatic analyses, both selected phage BPS 11 Q 3 and BPS 15 Q 2 seemed to be newfound obligate lytic phage strains with no indications for any potentially harmful genes in their genomes. In conclusion, our results showed a potential of isolated phages as food additives for controlling S. Enteritidis contamination in some salmonellosis outbreak-associated food vehicles, and there could be minimized potential risk associated with using BPS 11 Q 3 and BPS 15 Q 2 in food.

  20. Bioinformatic approaches to interrogating vitamin D receptor signaling.

    PubMed

    Campbell, Moray J

    2017-09-15

    Bioinformatics applies unbiased approaches to develop statistically-robust insight into health and disease. At the global, or "20,000 foot" view bioinformatic analyses of vitamin D receptor (NR1I1/VDR) signaling can measure where the VDR gene or protein exerts a genome-wide significant impact on biology; VDR is significantly implicated in bone biology and immune systems, but not in cancer. With a more VDR-centric, or "2000 foot" view, bioinformatic approaches can interrogate events downstream of VDR activity. Integrative approaches can combine VDR ChIP-Seq in cell systems where significant volumes of publically available data are available. For example, VDR ChIP-Seq studies can be combined with genome-wide association studies to reveal significant associations to immune phenotypes. Similarly, VDR ChIP-Seq can be combined with data from Cancer Genome Atlas (TCGA) to infer the impact of VDR target genes in cancer progression. Therefore, bioinformatic approaches can reveal what aspects of VDR downstream networks are significantly related to disease or phenotype. Copyright © 2017 The Author. Published by Elsevier B.V. All rights reserved.

  1. Text mining meets workflow: linking U-Compare with Taverna

    PubMed Central

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  2. Teaching bioinformatics and neuroinformatics by using free web-based tools.

    PubMed

    Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.

  3. Serial analysis of gene expression in a rat lung model of asthma.

    PubMed

    Yin, Lei-Miao; Jiang, Gong-Hao; Wang, Yu; Wang, Yan; Liu, Yan-Yan; Jin, Wei-Rong; Zhang, Zen; Xu, Yu-Dong; Yang, Yong-Qing

    2008-11-01

    The pathogenesis and molecular mechanism underlying asthma remain undetermined. The purpose of this study was to identify genes and pathways involved in the early airway response (EAR) phase of asthma by using serial analysis of gene expression (SAGE). Two SAGE tag libraries of lung tissues derived from a rat model of asthma and controls were generated. Bioinformatic analyses were carried out using the Database for Annotation, Visualization and IntegratedDiscovery Functional Annotation Tool, Gene Ontology (GO) TreeMachine and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A total of 26 552 SAGE tags of asthmatic rat lung were obtained, of which 12 221 were unique tags. Of the unique tags, 55.5% were matched with known genes. By comparison of the two libraries, 186 differentially expressed tags (P < 0.05) were identified, of which 103 were upregulated and 83 were downregulated. Using the bioinformatic tools these genes were classified into 23 functional groups, 15 KEGG pathways and 37 enriched GO categories. The bioinformatic analyses of gene distribution, enriched categories and the involvement of specific pathways in the SAGE libraries have provided information on regulatory networks of the EAR phase of asthma. Analyses of the regulated genes of interest may inform new hypotheses, increase our understanding of the disease and provide a foundation for future research.

  4. Discovery of putative salivary biomarkers for Sjögren's syndrome using high resolution mass spectrometry and bioinformatics.

    PubMed

    Zoukhri, Driss; Rawe, Ian; Singh, Mabi; Brown, Ashley; Kublin, Claire L; Dawson, Kevin; Haddon, William F; White, Earl L; Hanley, Kathleen M; Tusé, Daniel; Malyj, Wasyl; Papas, Athena

    2012-03-01

    The purpose of the current study was to determine if saliva contains biomarkers that can be used as diagnostic tools for Sjögren's syndrome (SjS). Twenty seven SjS patients and 27 age-matched healthy controls were recruited for these studies. Unstimulated glandular saliva was collected from the Wharton's duct using a suction device. Two µl of salvia were processed for mass spectrometry analyses on a prOTOF 2000 matrix-assisted laser desorption/ionization orthogonal time of flight (MALDI O-TOF) mass spectrometer. Raw data were analyzed using bioinformatic tools to identify biomarkers. MALDI O-TOF MS analyses of saliva samples were highly reproducible and the mass spectra generated were very rich in peptides and peptide fragments in the 750-7,500 Da range. Data analysis using bioinformatic tools resulted in several classification models being built and several biomarkers identified. One model based on 7 putative biomarkers yielded a sensitivity of 97.5%, specificity of 97.8% and an accuracy of 97.6%. One biomarker was present only in SjS samples and was identified as a proteolytic peptide originating from human basic salivary proline-rich protein 3 precursor. We conclude that salivary biomarkers detected by high-resolution mass spectrometry coupled with powerful bioinformatic tools offer the potential to serve as diagnostic/prognostic tools for SjS.

  5. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  6. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  7. Rapid Identification of Cell-Specific, Internalizing RNA Aptamers with Bioinformatics Analyses of a Cell-Based Aptamer Selection

    PubMed Central

    Thiel, William H.; Bair, Thomas; Peek, Andrew S.; Liu, Xiuying; Dassie, Justin; Stockdale, Katie R.; Behlke, Mark A.; Miller, Francis J.; Giangrande, Paloma H.

    2012-01-01

    Background The broad applicability of RNA aptamers as cell-specific delivery tools for therapeutic reagents depends on the ability to identify aptamer sequences that selectively access the cytoplasm of distinct cell types. Towards this end, we have developed a novel approach that combines a cell-based selection method (cell-internalization SELEX) with high-throughput sequencing (HTS) and bioinformatics analyses to rapidly identify cell-specific, internalization-competent RNA aptamers. Methodology/Principal Findings We demonstrate the utility of this approach by enriching for RNA aptamers capable of selective internalization into vascular smooth muscle cells (VSMCs). Several rounds of positive (VSMCs) and negative (endothelial cells; ECs) selection were performed to enrich for aptamer sequences that preferentially internalize into VSMCs. To identify candidate RNA aptamer sequences, HTS data from each round of selection were analyzed using bioinformatics methods: (1) metrics of selection enrichment; and (2) pairwise comparisons of sequence and structural similarity, termed edit and tree distance, respectively. Correlation analyses of experimentally validated aptamers or rounds revealed that the best cell-specific, internalizing aptamers are enriched as a result of the negative selection step performed against ECs. Conclusions and Significance We describe a novel approach that combines cell-internalization SELEX with HTS and bioinformatics analysis to identify cell-specific, cell-internalizing RNA aptamers. Our data highlight the importance of performing a pre-clear step against a non-target cell in order to select for cell-specific aptamers. We expect the extended use of this approach to enable the identification of aptamers to a multitude of different cell types, thereby facilitating the broad development of targeted cell therapies. PMID:22962591

  8. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.

    Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less

  9. Bioinformatic perspectives on NRPS/PKS megasynthases: advances and challenges.

    PubMed

    Jenke-Kodama, Holger; Dittmann, Elke

    2009-07-01

    The increased understanding of both fundamental principles and mechanistic variations of NRPS/PKS megasynthases along with the unprecedented availability of microbial sequences has inspired a number of in silico studies of both enzyme families. The insights that can be extracted from these analyses go far beyond a rough classification of data and have turned bioinformatics into a frontier field of natural products research. As databases are flooded with NRPS/PKS gene sequence of microbial genomes and metagenomes, increasingly reliable structural prediction methods can help to uncover hidden treasures. Already, phylogenetic analyses have revealed that NRPS/PKS pathways should not simply be regarded as enzyme complexes, specifically evolved to product a selected natural product. Rather, they represent a collection of genetic opinions, allowing biosynthetic pathways to be shuffled in a process of perpetual chemical innovations and pathways diversification in nature can give impulses for specificities, protein interactions and genetic engineering of libraries of novel peptides and polyketides. The successful translation of the knowledge obtained from bioinformatic dissection of NRPS/PKS megasynthases into new techniques for drug discovery and design remain challenges for the future.

  10. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

    PubMed Central

    Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.

    2017-01-01

    Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609

  11. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

    DOE PAGES

    Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.; ...

    2016-11-24

    Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less

  12. sRNAdb: A small non-coding RNA database for gram-positive bacteria

    PubMed Central

    2012-01-01

    Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983

  13. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.

    PubMed

    McDonald, Daniel; Clemente, Jose C; Kuczynski, Justin; Rideout, Jai Ram; Stombaugh, Jesse; Wendel, Doug; Wilke, Andreas; Huse, Susan; Hufnagle, John; Meyer, Folker; Knight, Rob; Caporaso, J Gregory

    2012-07-12

    We present the Biological Observation Matrix (BIOM, pronounced "biome") format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata. As the number of categories of comparative omics data types (collectively, the "ome-ome") grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses. The BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages. The BIOM file format and the biom-format project are steps toward reducing the "bioinformatics bottleneck" that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium.

  14. Omics Metadata Management Software v. 1 (OMMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less

  15. Mathematics and evolutionary biology make bioinformatics education comprehensible.

    PubMed

    Jungck, John R; Weisstein, Anton E

    2013-09-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.

  16. Mathematics and evolutionary biology make bioinformatics education comprehensible

    PubMed Central

    Weisstein, Anton E.

    2013-01-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621

  17. GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training

    PubMed Central

    Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.

    2015-01-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  18. Metabolomics of Genetically Modified Crops

    PubMed Central

    Simó, Carolina; Ibáñez, Clara; Valdés, Alberto; Cifuentes, Alejandro; García-Cañas, Virginia

    2014-01-01

    Metabolomic-based approaches are increasingly applied to analyse genetically modified organisms (GMOs) making it possible to obtain broader and deeper information on the composition of GMOs compared to that obtained from traditional analytical approaches. The combination in metabolomics of advanced analytical methods and bioinformatics tools provides wide chemical compositional data that contributes to corroborate (or not) the substantial equivalence and occurrence of unintended changes resulting from genetic transformation. This review provides insight into recent progress in metabolomics studies on transgenic crops focusing mainly in papers published in the last decade. PMID:25334064

  19. Teaching the bioinformatics of signaling networks: an integrated approach to facilitate multi-disciplinary learning.

    PubMed

    Korcsmaros, Tamas; Dunai, Zsuzsanna A; Vellai, Tibor; Csermely, Peter

    2013-09-01

    The number of bioinformatics tools and resources that support molecular and cell biology approaches is continuously expanding. Moreover, systems and network biology analyses are accompanied more and more by integrated bioinformatics methods. Traditional information-centered university teaching methods often fail, as (1) it is impossible to cover all existing approaches in the frame of a single course, and (2) a large segment of the current bioinformation can become obsolete in a few years. Signaling network offers an excellent example for teaching bioinformatics resources and tools, as it is both focused and complex at the same time. Here, we present an outline of a university bioinformatics course with four sample practices to demonstrate how signaling network studies can integrate biochemistry, genetics, cell biology and network sciences. We show that several bioinformatics resources and tools, as well as important concepts and current trends, can also be integrated to signaling network studies. The research-type hands-on experiences we show enable the students to improve key competences such as teamworking, creative and critical thinking and problem solving. Our classroom course curriculum can be re-formulated as an e-learning material or applied as a part of a specific training course. The multi-disciplinary approach and the mosaic setup of the course have the additional benefit to support the advanced teaching of talented students.

  20. Bioinformatics in the orphan crops.

    PubMed

    Armstead, Ian; Huang, Lin; Ravagnani, Adriana; Robson, Paul; Ougham, Helen

    2009-11-01

    Orphan crops are those which are grown as food, animal feed or other crops of some importance in agriculture, but which have not yet received the investment of research effort or funding required to develop significant public bioinformatics resources. Where an orphan crop is related to a well-characterised model plant species, comparative genomics and bioinformatics can often, though not always, be exploited to assist research and crop improvement. This review addresses some challenges and opportunities presented by bioinformatics in the orphan crops, using three examples: forage grasses from the genera Lolium and Festuca, forage legumes and the second generation energy crop Miscanthus.

  1. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  2. Improved genomic resources and new bioinformatic workflow for the carcinogenic parasite Clonorchis sinensis: Biotechnological implications.

    PubMed

    Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D

    Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Comparative analyses across cattle genders and breeds reveal the pitfalls caused by false positive and lineage-differential copy number variations.

    PubMed

    Zhou, Yang; Utsunomiya, Yuri T; Xu, Lingyang; Hay, El Hamidi Abdel; Bickhart, Derek M; Sonstegard, Tad S; Van Tassell, Curtis P; Garcia, Jose Fernando; Liu, George E

    2016-07-06

    We compared CNV region (CNVR) results derived from 1,682 Nellore cattle with equivalent results derived from our previous analysis of Bovine HapMap samples. By comparing CNV segment frequencies between different genders and groups, we identified 9 frequent, false positive CNVRs with a total length of 0.8 Mbp that were likely caused by assembly errors. Although there was a paucity of lineage specific events, we did find one 54 kb deletion on chr5 significantly enriched in Nellore cattle. A few highly frequent CNVRs present in both datasets were detected within genomic regions containing olfactory receptor, ATP-binding cassette, and major histocompatibility complex genes. We further evaluated their impacts on downstream bioinformatics and CNV association analyses. Our results revealed pitfalls caused by false positive and lineage-differential copy number variations and will increase the accuracy of future CNV studies in both taurine and indicine cattle.

  4. MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

    PubMed

    Ardin, Maude; Cahais, Vincent; Castells, Xavier; Bouaoun, Liacine; Byrnes, Graham; Herceg, Zdenko; Zavadil, Jiri; Olivier, Magali

    2016-04-18

    The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec. MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

  5. mySyntenyPortal: an application package to construct websites for synteny block analysis.

    PubMed

    Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum

    2018-06-05

    Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.

  6. Bioinformatic Analysis of Strawberry GSTF12 Gene

    NASA Astrophysics Data System (ADS)

    Wang, Xiran; Jiang, Leiyu; Tang, Haoru

    2018-01-01

    GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.

  7. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform.

    PubMed

    Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J; Davenport, Karen W; Bishop-Lilly, Kimberly A; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P; Chain, Patrick S G

    2017-01-09

    Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  9. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  10. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  11. Bioinformatic training needs at a health sciences campus.

    PubMed

    Oliver, Jeffrey C

    2017-01-01

    Health sciences research is increasingly focusing on big data applications, such as genomic technologies and precision medicine, to address key issues in human health. These approaches rely on biological data repositories and bioinformatic analyses, both of which are growing rapidly in size and scope. Libraries play a key role in supporting researchers in navigating these and other information resources. With the goal of supporting bioinformatics research in the health sciences, the University of Arizona Health Sciences Library established a Bioinformation program. To shape the support provided by the library, I developed and administered a needs assessment survey to the University of Arizona Health Sciences campus in Tucson, Arizona. The survey was designed to identify the training topics of interest to health sciences researchers and the preferred modes of training. Survey respondents expressed an interest in a broad array of potential training topics, including "traditional" information seeking as well as interest in analytical training. Of particular interest were training in transcriptomic tools and the use of databases linking genotypes and phenotypes. Staff were most interested in bioinformatics training topics, while faculty were the least interested. Hands-on workshops were significantly preferred over any other mode of training. The University of Arizona Health Sciences Library is meeting those needs through internal programming and external partnerships. The results of the survey demonstrate a keen interest in a variety of bioinformatic resources; the challenge to the library is how to address those training needs. The mode of support depends largely on library staff expertise in the numerous subject-specific databases and tools. Librarian-led bioinformatic training sessions provide opportunities for engagement with researchers at multiple points of the research life cycle. When training needs exceed library capacity, partnering with intramural and extramural units will be crucial in library support of health sciences bioinformatic research.

  12. Bioconductor: open software development for computational biology and bioinformatics

    PubMed Central

    Gentleman, Robert C; Carey, Vincent J; Bates, Douglas M; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean YH; Zhang, Jianhua

    2004-01-01

    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples. PMID:15461798

  13. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  14. Application of proteomics to ecology and population biology.

    PubMed

    Karr, T L

    2008-02-01

    Proteomics is a relatively new scientific discipline that merges protein biochemistry, genome biology and bioinformatics to determine the spatial and temporal expression of proteins in cells, tissues and whole organisms. There has been very little application of proteomics to the fields of behavioral genetics, evolution, ecology and population dynamics, and has only recently been effectively applied to the closely allied fields of molecular evolution and genetics. However, there exists considerable potential for proteomics to impact in areas related to functional ecology; this review will introduce the general concepts and methodologies that define the field of proteomics and compare and contrast the advantages and disadvantages with other methods. Examples of how proteomics can aid, complement and indeed extend the study of functional ecology will be discussed including the main tool of ecological studies, population genetics with an emphasis on metapopulation structure analysis. Because proteomic analyses provide a direct measure of gene expression, it obviates some of the limitations associated with other genomic approaches, such as microarray and EST analyses. Likewise, in conjunction with associated bioinformatics and molecular evolutionary tools, proteomics can provide the foundation of a systems-level integration approach that can enhance ecological studies. It can be envisioned that proteomics will provide important new information on issues specific to metapopulation biology and adaptive processes in nature. A specific example of the application of proteomics to sperm ageing is provided to illustrate the potential utility of the approach.

  15. In the loop: promoter–enhancer interactions and bioinformatics

    PubMed Central

    Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke

    2016-01-01

    Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731

  16. Best practices in bioinformatics training for life scientists.

    PubMed

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  17. Best practices in bioinformatics training for life scientists

    PubMed Central

    Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.

    2013-01-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301

  18. Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

    PubMed

    Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory

    2016-06-13

    Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

  19. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  20. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome

    PubMed Central

    2012-01-01

    Background We present the Biological Observation Matrix (BIOM, pronounced “biome”) format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata. As the number of categories of comparative omics data types (collectively, the “ome-ome”) grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses. Findings The BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages. Conclusions The BIOM file format and the biom-format project are steps toward reducing the “bioinformatics bottleneck” that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium. PMID:23587224

  1. Hidden weapons of microbial destruction in plant genomes

    PubMed Central

    Manners, John M

    2007-01-01

    Recent bioinformatic analyses of sequenced plant genomes reveal a previously unrecognized abundance of genes encoding antimicrobial cysteine-rich peptides, representing a formidable and dynamic defense arsenal against plant pests and pathogens. PMID:17903311

  2. Establishing a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia.

    PubMed

    Schneider, Maria Victoria; Griffin, Philippa C; Tyagi, Sonika; Flannery, Madison; Dayalan, Saravanan; Gladman, Simon; Watson-Haigh, Nathan; Bayer, Philipp E; Charleston, Michael; Cooke, Ira; Cook, Rob; Edwards, Richard J; Edwards, David; Gorse, Dominique; McConville, Malcolm; Powell, David; Wilkins, Marc R; Lonie, Andrew

    2017-06-30

    EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article. © The Author 2017. Published by Oxford University Press.

  3. Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses.

    PubMed

    Menegidio, Fabiano B; Jabes, Daniela L; Costa de Oliveira, Regina; Nunes, Luiz R

    2018-02-01

    This manuscript introduces and describes Dugong, a Docker image based on Ubuntu 16.04, which automates installation of more than 3500 bioinformatics tools (along with their respective libraries and dependencies), in alternative computational environments. The software operates through a user-friendly XFCE4 graphic interface that allows software management and installation by users not fully familiarized with the Linux command line and provides the Jupyter Notebook to assist in the delivery and exchange of consistent and reproducible protocols and results across laboratories, assisting in the development of open science projects. Source code and instructions for local installation are available at https://github.com/DugongBioinformatics, under the MIT open source license. Luiz.nunes@ufabc.edu.br. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Freshwater Metaviromics and Bacteriophages: A Current Assessment of the State of the Art in Relation to Bioinformatic Challenges

    PubMed Central

    Bruder, Katherine; Malki, Kema; Cooper, Alexandria; Sible, Emily; Shapiro, Jason W.; Watkins, Siobhan C.; Putonti, Catherine

    2016-01-01

    Advances in bioinformatics and sequencing technologies have allowed for the analysis of complex microbial communities at an unprecedented rate. While much focus is often placed on the cellular members of these communities, viruses play a pivotal role, particularly bacteria-infecting viruses (bacteriophages); phages mediate global biogeochemical processes and drive microbial evolution through bacterial grazing and horizontal gene transfer. Despite their importance and ubiquity in nature, very little is known about the diversity and structure of viral communities. Though the need for culture-based methods for viral identification has been somewhat circumvented through metagenomic techniques, the analysis of metaviromic data is marred with many unique issues. In this review, we examine the current bioinformatic approaches for metavirome analyses and the inherent challenges facing the field as illustrated by the ongoing efforts in the exploration of freshwater phage populations. PMID:27375355

  5. R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms

    PubMed Central

    Kramer, Frank; Bayerlová, Michaela; Beißbarth, Tim

    2014-01-01

    Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. PMID:24833336

  6. Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

    PubMed Central

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  7. Exploring DNA Structure with Cn3D

    ERIC Educational Resources Information Center

    Porter, Sandra G.; Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca

    2007-01-01

    Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology…

  8. Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations

    PubMed Central

    Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter

    2017-01-01

    Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594

  9. Expanding the horizons of microRNA bioinformatics.

    PubMed

    Huntley, Rachael P; Kramarz, Barbara; Sawford, Tony; Umrao, Zara; Kalea, Anastasia Z; Acquaah, Vanessa; Martin, Maria-Jesus; Mayr, Manuel; Lovering, Ruth C

    2018-06-05

    MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyse microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics datasets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 3,900 Gene Ontology annotations associated with almost 500 miRNAs from human, mouse and rat and over 2,200 experimentally validated miRNA:target interactions. We illustrate how this resource can be used to create miRNA-focused interaction networks with a biological context using the known biological role of miRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent datasets for reproducible functional analysis of microRNAs across all biological research areas. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  10. KBWS: an EMBOSS associated package for accessing bioinformatics web services.

    PubMed

    Oshita, Kazuki; Arakawa, Kazuharu; Tomita, Masaru

    2011-04-29

    The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS) UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS), adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded) and http://soap.g-language.org/kbws_dl.wsdl (Document/literal).

  11. Serum proteome profiling in canine idiopathic dilated cardiomyopathy using TMT-based quantitative proteomics approach.

    PubMed

    Bilić, Petra; Guillemin, Nicolas; Kovačević, Alan; Beer Ljubić, Blanka; Jović, Ines; Galan, Asier; Eckersall, Peter David; Burchmore, Richard; Mrljak, Vladimir

    2018-05-15

    Idiopathic dilated cardiomyopathy (iDCM) is a primary myocardial disorder with an unknown aetiology, characterized by reduced contractility and ventricular dilation of the left or both ventricles. Naturally occurring canine iDCM was used herein to identify serum proteomic signature of the disease compared to the healthy state, providing an insight into underlying mechanisms and revealing proteins with biomarker potential. To achieve this, we used high-throughput label-based quantitative LC-MS/MS proteomics approach and bioinformatics analysis of the in silico inferred interactome protein network created from the initial list of differential proteins. To complement the proteomic analysis, serum biochemical parameters and levels of know biomarkers of cardiac function were measured. Several proteins with biomarker potential were identified, such as inter-alpha-trypsin inhibitor heavy chain H4, microfibril-associated glycoprotein 4 and apolipoprotein A-IV, which were validated using an independent method (Western blotting) and showed high specificity and sensitivity according to the receiver operating characteristic curve analysis. Bioinformatics analysis revealed involvement of different pathways in iDCM, such as complement cascade activation, lipoprotein particles dynamics, elastic fibre formation, GPCR signalling and respiratory electron transport chain. Idiopathic dilated cardiomyopathy is a severe primary myocardial disease of unknown cause, affecting both humans and dogs. This study is a contribution to the canine heart disease research by means of proteomic and bioinformatic state of the art analyses, following similar approach in human iDCM research. Importantly, we used serum as non-invasive and easily accessible biological source of information and contributed to the scarce data on biofluid proteome research on this topic. Bioinformatics analysis revealed biological pathways modulated in canine iDCM with potential of further targeted research. Also, several proteins with biomarker potential have been identified and successfully validated. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Analytical performance of reciprocal isotope labeling of proteome digests for quantitative proteomics and its application for comparative studies of aerobic and anaerobic Escherichia coli proteomes.

    PubMed

    Lo, Andy; Weiner, Joel H; Li, Liang

    2013-09-17

    Due to limited sample amounts, instrument time considerations, and reagent costs, only a small number of replicate experiments are typically performed for quantitative proteome analyses. Generation of reproducible data that can be readily assessed for consistency within a small number of datasets is critical for accurate quantification. We report our investigation of a strategy using reciprocal isotope labeling of two comparative samples as a tool for determining proteome changes. Reciprocal labeling was evaluated to determine the internal consistency of quantified proteome changes from Escherichia coli grown under aerobic and anaerobic conditions. Qualitatively, the peptide overlap between replicate analyses of the same sample and reverse labeled samples were found to be within 8%. Quantitatively, reciprocal analyses showed only a slight increase in average overall inconsistency when compared with replicate analyses (1.29 vs. 1.24-fold difference). Most importantly, reverse labeling was successfully used to identify spurious values resulting from incorrect peptide identifications and poor peak fitting. After removal of 5% of the peptide data with low reproducibility, a total of 275 differentially expressed proteins (>1.50-fold difference) were consistently identified and were then subjected to bioinformatics analysis. General considerations and guidelines for reciprocal labeling experimental design and biological significance of obtained results are discussed. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Extracting patterns of database and software usage from the bioinformatics literature

    PubMed Central

    Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert

    2014-01-01

    Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253

  14. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538

  15. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.

  16. A Bioinformatics Approach for Integrated Transcriptomic and Proteomic Comparative Analyses of Model and Non-sequenced Anopheline Vectors of Human Malaria Parasites*

    PubMed Central

    Mohien, Ceereena Ubaida; Colquhoun, David R.; Mathias, Derrick K.; Gibbons, John G.; Armistead, Jennifer S.; Rodriguez, Maria C.; Rodriguez, Mario Henry; Edwards, Nathan J.; Hartler, Jürgen; Thallinger, Gerhard G.; Graham, David R.; Martinez-Barnetche, Jesus; Rokas, Antonis; Dinglasan, Rhoel R.

    2013-01-01

    Malaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the “model” African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax–An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus. PMID:23082028

  17. A bioinformatics approach for integrated transcriptomic and proteomic comparative analyses of model and non-sequenced anopheline vectors of human malaria parasites.

    PubMed

    Ubaida Mohien, Ceereena; Colquhoun, David R; Mathias, Derrick K; Gibbons, John G; Armistead, Jennifer S; Rodriguez, Maria C; Rodriguez, Mario Henry; Edwards, Nathan J; Hartler, Jürgen; Thallinger, Gerhard G; Graham, David R; Martinez-Barnetche, Jesus; Rokas, Antonis; Dinglasan, Rhoel R

    2013-01-01

    Malaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the "model" African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax-An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus.

  18. Comparative Transcriptomic and Proteomic Analyses Reveal a FluG-Mediated Signaling Pathway Relating to Asexual Sporulation of Antrodia camphorata.

    PubMed

    Li, Hua-Xiang; Lu, Zhen-Ming; Zhu, Qing; Gong, Jin-Song; Geng, Yan; Shi, Jin-Song; Xu, Zheng-Hong; Ma, Yan-He

    2017-09-01

    Medicinal mushroom Antrodia camphorata sporulate large numbers of arthroconidia in submerged fermentation, which is rarely reported in basidiomycetous fungi. Nevertheless, the molecular mechanisms underlying this asexual sporulation (conidiation) remain unclear. Here, we used comparative transcriptomic and proteomic approaches to elucidate possible signaling pathway relating to the asexual sporulation of A. camphorata. First, 104 differentially expressed proteins and 2586 differential cDNA sequences during the culture process of A. camphorata were identified by 2DE and RNA-seq, respectively. By applying bioinformatics analysis, a total of 67 genes which might play roles in the sporulation were obtained, and 18 of these genes, including fluG, sfgA, SfaD, flbA, flbB, flbC, flbD, nsdD, brlA, abaA, wetA, ganB, fadA, PkaA, veA, velB, vosA, and stuA might be involved in a potential FluG-mediated signaling pathway. Furthermore, the mRNA expression levels of the 18 genes in the proposed FluG-mediated signaling pathway were analyzed by quantitative real-time PCR. In summary, our study helps elucidate the molecular mechanisms underlying the asexual sporulation of A. camphorata, and provides also useful transcripts and proteome for further bioinformatics study of this valuable medicinal mushroom. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. COMAN: a web server for comprehensive metatranscriptomics analysis.

    PubMed

    Ni, Yueqiong; Li, Jun; Panagiotou, Gianni

    2016-08-11

    Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming. We developed COMAN (Comprehensive Metatranscriptomics Analysis), a web-based tool dedicated to automatically and comprehensively analysing metatranscriptomic data. COMAN pipeline includes quality control of raw reads, removal of reads derived from non-coding RNA, followed by functional annotation, comparative statistical analysis, pathway enrichment analysis, co-expression network analysis and high-quality visualisation. The essential data generated by COMAN are also provided in tabular format for additional analysis and integration with other software. The web server has an easy-to-use interface and detailed instructions, and is freely available at http://sbb.hku.hk/COMAN/ CONCLUSIONS: COMAN is an integrated web server dedicated to comprehensive functional analysis of metatranscriptomic data, translating massive amount of reads to data tables and high-standard figures. It is expected to facilitate the researchers with less expertise in bioinformatics in answering microbiota-related biological questions and to increase the accessibility and interpretation of microbiota RNA-Seq data.

  20. EDEN: evolutionary dynamics within environments

    PubMed Central

    Münch, Philipp C.; Stecher, Bärbel; McHardy, Alice C.

    2017-01-01

    Abstract Summary Metagenomics revolutionized the field of microbial ecology, giving access to Gb-sized datasets of microbial communities under natural conditions. This enables fine-grained analyses of the functions of community members, studies of their association with phenotypes and environments, as well as of their microevolution and adaptation to changing environmental conditions. However, phylogenetic methods for studying adaptation and evolutionary dynamics are not able to cope with big data. EDEN is the first software for the rapid detection of protein families and regions under positive selection, as well as their associated biological processes, from meta- and pangenome data. It provides an interactive result visualization for detailed comparative analyses. Availability and implementation EDEN is available as a Docker installation under the GPL 3.0 license, allowing its use on common operating systems, at http://www.github.com/hzi-bifo/eden. Contact alice.mchardy@helmholtz-hzi.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:28637301

  1. Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

    PubMed

    Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

    2017-11-10

    RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.

    PubMed

    Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W

    2018-01-01

    Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. yStreX: yeast stress expression database

    PubMed Central

    Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina

    2014-01-01

    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351

  4. ASaiM: a Galaxy-based framework to analyze microbiota data.

    PubMed

    Batut, Bérénice; Gravouil, Kévin; Defois, Clémence; Hiltemann, Saskia; Brugère, Jean-François; Peyretaillade, Eric; Peyret, Pierre

    2018-05-22

    New generations of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore and visualize microbiota information from raw metataxonomic, metagenomic or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets, but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible and shareable.

  5. The bench scientist's guide to RNA-Seq analysis

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatic specialists. Here we outline a methods strategy desi...

  6. Role of nematode peptides and other small molecules in plant parasitism

    USDA-ARS?s Scientific Manuscript database

    Molecular, genetic, and biochemical studies are demonstrating an increasingly important role of peptide signaling in nematode parasitism of plants. To date, the majority of nematode-secreted peptides identified share similarity with plant CLAVATA3/ESR (CLE) peptides, but bioinformatics analyses of n...

  7. Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia.

    PubMed

    Watson-Haigh, Nathan S; Shang, Catherine A; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V

    2013-09-01

    The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.

  8. Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia

    PubMed Central

    Watson-Haigh, Nathan S.; Shang, Catherine A.; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V.

    2013-01-01

    The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show–style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources. PMID:23543352

  9. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  10. An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

    PubMed

    van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

    2009-06-01

    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.

  11. A comprehensive framework for functional diversity patterns of marine chromophytic phytoplankton using rbcL phylogeny

    PubMed Central

    Samanta, Brajogopal; Bhadury, Punyasloke

    2016-01-01

    Marine chromophytes are taxonomically diverse group of algae and contribute approximately half of the total oceanic primary production. To understand the global patterns of functional diversity of chromophytic phytoplankton, robust bioinformatics and statistical analyses including deep phylogeny based on 2476 form ID rbcL gene sequences representing seven ecologically significant oceanographic ecoregions were undertaken. In addition, 12 form ID rbcL clone libraries were generated and analyzed (148 sequences) from Sundarbans Biosphere Reserve representing the world’s largest mangrove ecosystem as part of this study. Global phylogenetic analyses recovered 11 major clades of chromophytic phytoplankton in varying proportions with several novel rbcL sequences in each of the seven targeted ecoregions. Majority of OTUs was found to be exclusive to each ecoregion, whereas some were shared by two or more ecoregions based on beta-diversity analysis. Present phylogenetic and bioinformatics analyses provide a strong statistical support for the hypothesis that different oceanographic regimes harbor distinct and coherent groups of chromophytic phytoplankton. It has been also shown as part of this study that varying natural selection pressure on form ID rbcL gene under different environmental conditions could lead to functional differences and overall fitness of chromophytic phytoplankton populations. PMID:26861415

  12. Characterizing the “POAGome”: A bioinformatics-driven approach to primary open-angle glaucoma

    PubMed Central

    Danford, Ian D.; Verkuil, Lana D.; Choi, Daniel J.; Collins, David W.; Gudiseva, Harini V.; Uyhazi, Katherine E.; Lau, Marisa K.; Kanu, Levi N.; Grant, Gregory R.; Chavali, Venkata R.M.; O’Brien, Joan M.

    2017-01-01

    Primary open-angle glaucoma (POAG) is a genetically, physiologically, and phenotypically complex neurodegenerative disorder. This study addressed the expanding collection of genes associated with POAG, referred to as the “POAGome.” We used bioinformatics tools to perform an extensive, systematic literature search and compiled 542 genes with confirmed associations with POAG and its related phenotypes (normal tension glaucoma, ocular hypertension, juvenile open-angle glaucoma, and primary congenital glaucoma). The genes were classified according to their associated ocular tissues and phenotypes, and functional annotation and pathway analyses were subsequently performed. Our study reveals that no single molecular pathway can encompass the pathophysiology of POAG. The analyses suggested that inflammation and senescence may play pivotal roles in both the development and perpetuation of the retinal ganglion cell degeneration seen in POAG. The TGF-β signaling pathway was repeatedly implicated in our analyses, suggesting that it may be an important contributor to the manifestation of POAG in the anterior and posterior segments of the globe. We propose a molecular model of POAG revolving around TGF-β signaling, which incorporates the roles of inflammation and senescence in this disease. Finally, we highlight emerging molecular therapies that show promise for treating POAG. PMID:28223208

  13. Hui Wei | NREL

    Science.gov Websites

    , bioinformatics, and literature analyses. In total, 75 proteins were identified using the in-solution method, and 236 proteins were identified using the in-gel method, among which approximately 10% of proteins were Molecular Biology (2012) "Tracking Dynamics of Biomass Composting by Monitoring the Changes in

  14. HIPPI: highly accurate protein family classification with ensembles of HMMs.

    PubMed

    Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

    2016-11-11

    Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  15. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection

    PubMed Central

    Ng, Siemon H. S.; Vandeputte, Olivier; Aljanahi, Aisha; Deyati, Avisek; Cassart, Jean-Pol; Charlebois, Robert L.; Taliaferro, Lanyn P.

    2017-01-01

    ABSTRACT The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials, clinical materials, and biological products. Therefore, HTS can be a powerful tool for supplementing current methods for demonstrating the absence of adventitious or unwanted viruses in biological products, particularly when using a new cell line. However, HTS is a complex technology with different platforms, which needs standardization for evaluation of biologics. This collaborative study was undertaken to investigate detection of different virus types using two different HTS platforms. The results of the independently performed studies demonstrated a similar sensitivity of virus detection, regardless of the different sample preparation and processing procedures and bioinformatic analyses done in the three laboratories. Comparable HTS detection of different virus types supports future development of reference virus materials for standardization and validation of different HTS platforms. PMID:28932815

  16. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    PubMed

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  17. Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.

    PubMed

    Carretero-Paulet, Lorenzo; Albert, Victor A

    2016-01-01

    The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.

  18. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.

    PubMed

    Severgnini, Marco; Bicciato, Silvio; Mangano, Eleonora; Scarlatti, Francesca; Mezzelani, Alessandra; Mattioli, Michela; Ghidoni, Riccardo; Peano, Clelia; Bonnal, Raoul; Viti, Federica; Milanesi, Luciano; De Bellis, Gianluca; Battaglia, Cristina

    2006-06-01

    Meta-analysis of microarray data is increasingly important, considering both the availability of multiple platforms using disparate technologies and the accumulation in public repositories of data sets from different laboratories. We addressed the issue of comparing gene expression profiles from two microarray platforms by devising a standardized investigative strategy. We tested this procedure by studying MDA-MB-231 cells, which undergo apoptosis on treatment with resveratrol. Gene expression profiles were obtained using high-density, short-oligonucleotide, single-color microarray platforms: GeneChip (Affymetrix) and CodeLink (Amersham). Interplatform analyses were carried out on 8414 common transcripts represented on both platforms, as identified by LocusLink ID, representing 70.8% and 88.6% of annotated GeneChip and CodeLink features, respectively. We identified 105 differentially expressed genes (DEGs) on CodeLink and 42 DEGs on GeneChip. Among them, only 9 DEGs were commonly identified by both platforms. Multiple analyses (BLAST alignment of probes with target sequences, gene ontology, literature mining, and quantitative real-time PCR) permitted us to investigate the factors contributing to the generation of platform-dependent results in single-color microarray experiments. An effective approach to cross-platform comparison involves microarrays of similar technologies, samples prepared by identical methods, and a standardized battery of bioinformatic and statistical analyses.

  19. The GOBLET training portal: a global repository of bioinformatics training materials, courses and trainers

    PubMed Central

    Corpas, Manuel; Jimenez, Rafael C.; Bongcam-Rudloff, Erik; Budd, Aidan; Brazas, Michelle D.; Fernandes, Pedro L.; Gaeta, Bruno; van Gelder, Celia; Korpelainen, Eija; Lewitter, Fran; McGrath, Annette; MacLean, Daniel; Palagi, Patricia M.; Rother, Kristian; Taylor, Jan; Via, Allegra; Watson, Mick; Schneider, Maria Victoria; Attwood, Teresa K.

    2015-01-01

    Summary: Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide. Availability and implementation: http://mygoblet.org/training-portal Contact: manuel.corpas@tgac.ac.uk PMID:25189782

  20. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform, Version 1.5 and 1.x.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick; Lo, Chien-Chi; Li, Po-E

    EDGE bioinformatics was developed to help biologists process Next Generation Sequencing data (in the form of raw FASTQ files), even if they have little to no bioinformatics expertise. EDGE is a highly integrated and interactive web-based platform that is capable of running many of the standard analyses that biologists require for viral, bacterial/archaeal, and metagenomic samples. EDGE provides the following analytical workflows: quality trimming and host removal, assembly and annotation, comparisons against known references, taxonomy classification of reads and contigs, whole genome SNP-based phylogenetic analysis, and PCR analysis. EDGE provides an intuitive web-based interface for user input, allows users tomore » visualize and interact with selected results (e.g. JBrowse genome browser), and generates a final detailed PDF report. Results in the form of tables, text files, graphic files, and PDFs can be downloaded. A user management system allows tracking of an individual’s EDGE runs, along with the ability to share, post publicly, delete, or archive their results.« less

  1. An in-silico insight into the characteristics of β-propeller phytase.

    PubMed

    Mathew, Akash; Verma, Anukriti; Gaur, Smriti

    2014-06-01

    Phytase is an enzyme that is found extensively in the plant kingdom and in some species of bacteria and fungi. This paper identifies and analyses the available full length sequences of β-propeller phytases (BPP). BPP was chosen due to its potential applicability in the field of aquaculture. The sequences were obtained from the Uniprot database and subject to various online bioinformatics tools to elucidate the physio-chemical characteristics, secondary structures and active site compositions of BPP. Protparam and SOPMA were used to analyse the physiochemical and secondary structure characteristics, while the Expasy online modelling tool and CASTp were used to model the 3-D structure and identify the active sites of the BPP sequences. The amino acid compositions of the four sequences were compared and composed in a graphical format to identify similarities and highlight the potentially important amino acids that form the active site of BPP. This study aims to analyse BPP and contribute to the clarification of the molecular mechanism involved in the enzyme activity of BPP and contribute in part to the possibility of constructing a synthetic version of BPP.

  2. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.

    PubMed

    Davis, Sean; Meltzer, Paul S

    2007-07-15

    Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics (NCBI) at the National Institutes of Health is a repository of nearly 140,000 gene expression experiments. The BioConductor project (Gentleman et al., 2004) is an open-source and open-development software project built in the R statistical programming environment (R Development core Team, 2005) for the analysis and comprehension of genomic data. The tools contained in the BioConductor project represent many state-of-the-art methods for the analysis of microarray and genomics data. We have developed a software tool that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past. The software, called GEOquery, effectively establishes a bridge between GEO and BioConductor. Easy access to GEO data from BioConductor will likely lead to new analyses of GEO data using novel and rigorous statistical and bioinformatic tools. Facilitating analyses and meta-analyses of microarray data will increase the efficiency with which biologically important conclusions can be drawn from published genomic data. GEOquery is available as part of the BioConductor project.

  3. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation.

    PubMed

    McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick

    2007-01-01

    The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.

  4. Interpreting the biological relevance of bioinformatic analyses with T-DNA sequence for protein allergenicity.

    PubMed

    Harper, B; McClain, S; Ganko, E W

    2012-08-01

    Global regulatory agencies require bioinformatic sequence analysis as part of their safety evaluation for transgenic crops. Analysis typically focuses on encoded proteins and adjacent endogenous flanking sequences. Recently, regulatory expectations have expanded to include all reading frames of the inserted DNA. The intent is to provide biologically relevant results that can be used in the overall assessment of safety. This paper evaluates the relevance of assessing the allergenic potential of all DNA reading frames found in common food genes using methods considered for the analysis of T-DNA sequences used in transgenic crops. FASTA and BLASTX algorithms were used to compare genes from maize, rice, soybean, cucumber, melon, watermelon, and tomato using international regulatory guidance. Results show that BLASTX for maize yielded 7254 alignments that exceeded allergen similarity thresholds and 210,772 alignments that matched eight or more consecutive amino acids with an allergen; other crops produced similar results. This analysis suggests that each nontransgenic crop has a much greater potential for allergenic risk than what has been observed clinically. We demonstrate that a meaningful safety assessment is unlikely to be provided by using methods with inherently high frequencies of false positive alignments when broadly applied to all reading frames of DNA sequence. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Clinical proteomic analysis of scrub typhus infection.

    PubMed

    Park, Edmond Changkyun; Lee, Sang-Yeop; Yun, Sung Ho; Choi, Chi-Won; Lee, Hayoung; Song, Hyun Seok; Jun, Sangmi; Kim, Gun-Hwa; Lee, Chang-Seop; Kim, Seung Il

    2018-01-01

    Scrub typhus is an acute and febrile infectious disease caused by the Gram-negative α-proteobacterium Orientia tsutsugamushi from the family Rickettsiaceae that is widely distributed in Northern, Southern and Eastern Asia. In the present study, we analysed the serum proteome of scrub typhus patients to investigate specific clinical protein patterns in an attempt to explain pathophysiology and discover potential biomarkers of infection. Serum samples were collected from three patients (before and after treatment with antibiotics) and three healthy subjects. One-dimensional sodium dodecyl sulphate-polyacrylamide gel electrophoresis followed by liquid chromatography-tandem mass spectrometry was performed to identify differentially abundant proteins using quantitative proteomic approaches. Bioinformatic analysis was then performed using Ingenuity Pathway Analysis. Proteomic analysis identified 236 serum proteins, of which 32 were differentially expressed in normal subjects, naive scrub typhus patients and patients treated with antibiotics. Comparative bioinformatic analysis of the identified proteins revealed up-regulation of proteins involved in immune responses, especially complement system, following infection with O. tsutsugamushi , and normal expression was largely rescued by antibiotic treatment. This is the first proteomic study of clinical serum samples from scrub typhus patients. Proteomic analysis identified changes in protein expression upon infection with O. tsutsugamushi and following antibiotic treatment. Our results provide valuable information for further investigation of scrub typhus therapy and diagnosis.

  6. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

    PubMed

    Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee

    2018-01-01

    Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  8. Leveraging sequence-based faecal microbial community survey data to identify a composite biomarker for colorectal cancer.

    PubMed

    Shah, Manasi S; DeSantis, Todd Z; Weinmaier, Thomas; McMurdie, Paul J; Cope, Julia L; Altrichter, Adam; Yamal, Jose-Miguel; Hollister, Emily B

    2018-05-01

    Colorectal cancer (CRC) is the second leading cause of cancer-associated mortality in the USA. The faecal microbiome may provide non-invasive biomarkers of CRC and indicate transition in the adenoma-carcinoma sequence. Re-analysing raw sequence and metadata from several studies uniformly, we sought to identify a composite and generalisable microbial marker for CRC. Raw 16S rRNA gene sequence data sets from nine studies were processed with two pipelines, (1) QIIME closed reference (QIIME-CR) or (2) a strain-specific method herein termed SS-UP (Strain Select, UPARSE bioinformatics pipeline). A total of 509 samples (79 colorectal adenoma, 195 CRC and 235 controls) were analysed. Differential abundance, meta-analysis random effects regression and machine learning analyses were carried out to determine the consistency and diagnostic capabilities of potential microbial biomarkers. Definitive taxa, including Parvimonas micra ATCC 33270, Streptococcus anginosus and yet-to-be-cultured members of Proteobacteria, were frequently and significantly increased in stools from patients with CRC compared with controls across studies and had high discriminatory capacity in diagnostic classification. Microbiome-based CRC versus control classification produced an area under receiver operator characteristic (AUROC) curve of 76.6% in QIIME-CR and 80.3% in SS-UP. Combining clinical and microbiome markers gave a diagnostic AUROC of 83.3% for QIIME-CR and 91.3% for SS-UP. Despite technological differences across studies and methods, key microbial markers emerged as important in classifying CRC cases and such could be used in a universal diagnostic for the disease. The choice of bioinformatics pipeline influenced accuracy of classification. Strain-resolved microbial markers might prove crucial in providing a microbial diagnostic for CRC. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. CBESW: sequence alignment on the Playstation 3.

    PubMed

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-09-17

    The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  10. CBESW: Sequence Alignment on the Playstation 3

    PubMed Central

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-01-01

    Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications. PMID:18798993

  11. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  12. A multilevel probabilistic beam search algorithm for the shortest common supersequence problem.

    PubMed

    Gallardo, José E

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably.

  13. Multi-trait analysis of genome-wide association summary statistics using MTAG.

    PubMed

    Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

    2018-02-01

    We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff  = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

  14. A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data.

    PubMed

    Andrews, Kimberly R; Adams, Jennifer R; Cassirer, E Frances; Plowright, Raina K; Gardner, Colby; Dwire, Maggie; Hohenlohe, Paul A; Waits, Lisette P

    2018-06-05

    The development of high-throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifying SNP panels that are informative for parentage analysis from restriction site-associated DNA sequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis across SNP panels generated with or without the use of a reference genome, and between SNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome produced SNP panels with >95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across all SNP panels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284 SNPs for Mexican gray wolf and 142 SNPs for bighorn sheep, indicating our pipeline can be used to develop SNP genotyping assays for parentage analysis with relatively small numbers of loci. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  15. The Landscape of Circular RNA Expression Profiles in Papillary Thyroid Carcinoma Based on RNA Sequencing.

    PubMed

    Lan, Xiabin; Xu, Jiajie; Chen, Chao; Zheng, Chuanming; Wang, Jiafeng; Cao, Jun; Zhu, Xuhang; Ge, Minghua

    2018-05-25

    Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. However, the molecular mechanisms responsible for its tumorigenesis and progression remain largely unknown. Circular RNA (circRNA) is a novel type of noncoding RNA that can serve as an ideal biomarker due to its stability. Recent evidence suggests that circRNAs play important roles in tumorigenesis. This study aims to investigate circRNA expression profiles and their potential biological functions in PTC. High-throughput RNA sequencing was used to assess circRNA expression profiles in PTC, and quantitative real-time polymerase chain reaction (qRT-PCR) was used to validate dysregulated circRNAs. Receiver operating characteristic (ROC) curves were generated to evaluate the diagnostic value of circRNAs for PTC. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were employed to determine the biological functions of differentially expressed circRNAs. Bioinformatic analyses were applied to predict interactions between circRNAs and microRNAs (miRNAs), and a circRNA-miRNA-mRNA network was constructed using Cytoscape software. We identified a number of differentially expressed circRNAs in PTC tissues compared with paired normal thyroid tissues, with chr5: 160757890-160763776-, chr12: 40696591-40697936+, chr7: 22330794-22357656-, and chr21: 16386665-16415895- being upregulated, and chr7: 91924203-91957214+, chr2: 179514891-179516047-, chr9: 16435553-16437522-, and chr22: 36006931-36007153- being downregulated. These findings were confirmed by qRT-PCR, and ROC curves indicated that they can serve as potential biomarkers for PTC. GO and KEGG pathway analyses showed that some of these circRNAs are related to cancers. Additionally, bioinformatic analyses revealed a potential competing-endogenous-RNA-regulating network among circRNAs, miRNAs, and mRNAs. Our study results depict the landscape of circRNA expression profiles in PTC and also provide potential biomarkers for PTC. Further functional and mechanistic studies of these circRNAs may improve our understanding of PTC tumorigenesis. © 2018 The Author(s). Published by S. Karger AG, Basel.

  16. The Generation Challenge Programme Platform: Semantic Standards and Workbench for Crop Science

    PubMed Central

    Bruskiewich, Richard; Senger, Martin; Davenport, Guy; Ruiz, Manuel; Rouard, Mathieu; Hazekamp, Tom; Takeya, Masaru; Doi, Koji; Satoh, Kouji; Costa, Marcos; Simon, Reinhard; Balaji, Jayashree; Akintunde, Akinnola; Mauleon, Ramil; Wanchana, Samart; Shah, Trushar; Anacleto, Mylah; Portugal, Arllet; Ulat, Victor Jun; Thongjuea, Supat; Braak, Kyle; Ritter, Sebastian; Dereeper, Alexis; Skofic, Milko; Rojas, Edwin; Martins, Natalia; Pappas, Georgios; Alamban, Ryan; Almodiel, Roque; Barboza, Lord Hendrix; Detras, Jeffrey; Manansala, Kevin; Mendoza, Michael Jonathan; Morales, Jeffrey; Peralta, Barry; Valerio, Rowena; Zhang, Yi; Gregorio, Sergio; Hermocilla, Joseph; Echavez, Michael; Yap, Jan Michael; Farmer, Andrew; Schiltz, Gary; Lee, Jennifer; Casstevens, Terry; Jaiswal, Pankaj; Meintjes, Ayton; Wilkinson, Mark; Good, Benjamin; Wagner, James; Morris, Jane; Marshall, David; Collins, Anthony; Kikuchi, Shoshi; Metz, Thomas; McLaren, Graham; van Hintum, Theo

    2008-01-01

    The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making. PMID:18483570

  17. Comparison of three quantitative phosphoproteomic strategies to study receptor tyrosine kinase signaling.

    PubMed

    Zhang, Guoan; Neubert, Thomas A

    2011-12-02

    There are three quantitative phosphoproteomic strategies most commonly used to study receptor tyrosine kinase (RTK) signaling. These strategies quantify changes in: (1) all three forms of phosphosites (phosphoserine, phosphothreonine and phosphotyrosine) following enrichment of phosphopeptides by titanium dioxide or immobilized metal affinity chromatography; (2) phosphotyrosine sites following anti- phosphotyrosine antibody enrichment of phosphotyrosine peptides; or (3) phosphotyrosine proteins and their binding partners following anti-phosphotyrosine protein immunoprecipitation. However, it is not clear from literature which strategy is more effective. In this study, we assessed the utility of these three phosphoproteomic strategies in RTK signaling studies by using EphB receptor signaling as an example. We used all three strategies with stable isotope labeling with amino acids in cell culture (SILAC) to compare changes in phosphoproteomes upon EphB receptor activation. We used bioinformatic analysis to compare results from the three analyses. Our results show that the three strategies provide complementary information about RTK pathways.

  18. Diversity and evolution of the emerging Pandoraviridae family.

    PubMed

    Legendre, Matthieu; Fabre, Elisabeth; Poirot, Olivier; Jeudy, Sandra; Lartigue, Audrey; Alempic, Jean-Marie; Beucher, Laure; Philippe, Nadège; Bertaux, Lionel; Christo-Foroux, Eugène; Labadie, Karine; Couté, Yohann; Abergel, Chantal; Claverie, Jean-Michel

    2018-06-11

    With DNA genomes reaching 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infecting pandoraviruses remained up to now the most complex viruses since their discovery in 2013. Our isolation of three new strains from distant locations and environments is now used to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses reveals many non-coding transcripts and significantly reduces the former set of predicted protein-coding genes. Here we show that the pandoraviruses exhibit an open pan-genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggest that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes.

  19. PLI: a web-based tool for the comparison of protein-ligand interactions observed on PDB structures.

    PubMed

    Gallina, Anna Maria; Bisignano, Paola; Bergamino, Maurizio; Bordo, Domenico

    2013-02-01

    A large fraction of the entries contained in the Protein Data Bank describe proteins in complex with low molecular weight molecules such as physiological compounds or synthetic drugs. In many cases, the same molecule is found in distinct protein-ligand complexes. There is an increasing interest in Medicinal Chemistry in comparing protein binding sites to get insight on interactions that modulate the binding specificity, as this structural information can be correlated with other experimental data of biochemical or physiological nature and may help in rational drug design. The web service protein-ligand interaction presented here provides a tool to analyse and compare the binding pockets of homologous proteins in complex with a selected ligand. The information is deduced from protein-ligand complexes present in the Protein Data Bank and stored in the underlying database. Freely accessible at http://bioinformatics.istge.it/pli/.

  20. Phylogeny and evolution of plant cyclic nucleotide-gated ion channel (CNGC) gene family and functional analyses of tomato CNGCs

    PubMed Central

    Saand, Mumtaz Ali; Xu, You-Ping; Munyampundu, Jean-Pierre; Li, Wen; Zhang, Xuan-Rui; Cai, Xin-Zhong

    2015-01-01

    Cyclic nucleotide-gated ion channels (CNGCs) are calcium-permeable channels that are involved in various biological functions. Nevertheless, phylogeny and function of plant CNGCs are not well understood. In this study, 333 CNGC genes from 15 plant species were identified using comprehensive bioinformatics approaches. Extensive bioinformatics analyses demonstrated that CNGCs of Group IVa were distinct to those of other groups in gene structure and amino acid sequence of cyclic nucleotide-binding domain. A CNGC-specific motif that recognizes all identified plant CNGCs was generated. Phylogenetic analysis indicated that CNGC proteins of flowering plant species formed five groups. However, CNGCs of the non-vascular plant Physcomitrella patens clustered only in two groups (IVa and IVb), while those of the vascular non-flowering plant Selaginella moellendorffii gathered in four (IVa, IVb, I and II). These data suggest that Group IV CNGCs are most ancient and Group III CNGCs are most recently evolved in flowering plants. Furthermore, silencing analyses revealed that a set of CNGC genes might be involved in disease resistance and abiotic stress responses in tomato and function of SlCNGCs does not correlate with the group that they are belonging to. Our results indicate that Group IVa CNGCs are structurally but not functionally unique among plant CNGCs. PMID:26546226

  1. Competing endogenous RNA and interactome bioinformatic analyses on human telomerase.

    PubMed

    Arancio, Walter; Pizzolanti, Giuseppe; Genovese, Swonild Ilenia; Baiamonte, Concetta; Giordano, Carla

    2014-04-01

    We present a classic interactome bioinformatic analysis and a study on competing endogenous (ce) RNAs for hTERT. The hTERT gene codes for the catalytic subunit and limiting component of the human telomerase complex. Human telomerase reverse transcriptase (hTERT) is essential for the integrity of telomeres. Telomere dysfunctions have been widely reported to be involved in aging, cancer, and cellular senescence. The hTERT gene network has been analyzed using the BioGRID interaction database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA (http://genemania.org/). The network of interaction of hTERT transcripts has been further analyzed following the competing endogenous (ce) RNA hypotheses (messenger [m] RNAs cross-talk via micro [mi] RNAs) using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest a role for Akt, nuclear factor-κB (NF-κB), heat shock protein 90 (HSP90), p70/p80 autoantigen, 14-3-3 proteins, and dynein in telomere functions. Roles for histone acetylation/deacetylation and proteoglycan metabolism are also proposed.

  2. A bioinformatics analysis of Lamin-A regulatory network: a perspective on epigenetic involvement in Hutchinson-Gilford progeria syndrome.

    PubMed

    Arancio, Walter

    2012-04-01

    Hutchinson-Gilford progeria syndrome (HGPS) is a rare human genetic disease that leads to premature aging. HGPS is caused by mutation in the Lamin-A (LMNA) gene that leads, in affected young individuals, to the accumulation of the progerin protein, usually present only in aging differentiated cells. Bioinformatics analyses of the network of interactions of the LMNA gene and transcripts are presented. The LMNA gene network has been analyzed using the BioGRID database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA ( http://genemania.org/). The network of interaction of LMNA transcripts has been further analyzed following the competing endogenous (ceRNA) hypotheses (RNA cross-talk via microRNAs [miRNAs]) and using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest particular relevance of epigenetic modifiers (via acetylase complexes and specifically HTATIP histone acetylase) and adenosine triphosphate (ATP)-dependent chromatin remodelers (via pBAF, BAF, and SWI/SNF complexes).

  3. Bioinformatic analyses to select phenotype affecting polymorphisms in HTR2C gene.

    PubMed

    Piva, Francesco; Giulietti, Matteo; Baldelli, Luisa; Nardi, Bernardo; Bellantuono, Cesario; Armeni, Tatiana; Saccucci, Franca; Principato, Giovanni

    2011-08-01

    Single nucleotide polymorphisms (SNPs) in serotonin related genes influence mental disorders, responses to pharmacological and psychotherapeutic treatments. In planning association studies, researchers that want to investigate new SNPs have to select some among a large number of candidates. Our aim is to guide researchers in the selection of the most likely phenotype affecting polymorphisms. Here, we studied serotonin receptor 2C (HTR2C) SNPs because, till now, only relatively few of about 2000 are investigated. We used the most updated and assessed bioinformatic tools to predict which variations can give rise to biological effects among 2450 HTR2C SNPs. We suggest 48 SNPs that are worth considering in future association studies in the field of psychiatry, psychology and pharmacogenomics. Moreover, our analyses point out the biological level probably affected, such as transcription, splicing, miRNA regulation and protein structure, thus allowing to suggest future molecular investigations. Although few association studies are available in literature, their results are in agreement with our predictions, showing that our selection methods can help to guide future association studies. Copyright © 2011 John Wiley & Sons, Ltd.

  4. RNA Sequencing and Bioinformatics Analysis Implicate the Regulatory Role of a Long Noncoding RNA-mRNA Network in Hepatic Stellate Cell Activation.

    PubMed

    Guo, Can-Jie; Xiao, Xiao; Sheng, Li; Chen, Lili; Zhong, Wei; Li, Hai; Hua, Jing; Ma, Xiong

    2017-01-01

    To analyze the long noncoding (lncRNA)-mRNA expression network and potential roles in rat hepatic stellate cells (HSCs) during activation. LncRNA expression was analyzed in quiescent and culture-activated HSCs by RNA sequencing, and differentially expressed lncRNAs verified by quantitative reverse transcription polymerase chain reaction (qRT-PCR) were subjected to bioinformatics analysis. In vivo analyses of differential lncRNA-mRNA expression were performed on a rat model of liver fibrosis. We identified upregulation of 12 lncRNAs and 155 mRNAs and downregulation of 12 lncRNAs and 374 mRNAs in activated HSCs. Additionally, we identified the differential expression of upregulated lncRNAs (NONRATT012636.2, NONRATT016788.2, and NONRATT021402.2) and downregulated lncRNAs (NONRATT007863.2, NONRATT019720.2, and NONRATT024061.2) in activated HSCs relative to levels observed in quiescent HSCs, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses showed that changes in lncRNAs associated with HSC activation revealed 11 significantly enriched pathways according to their predicted targets. Moreover, based on the predicted co-expression network, the relative dynamic levels of NONRATT013819.2 and lysyl oxidase (Lox) were compared during HSC activation both in vitro and in vivo. Our results confirmed the upregulation of lncRNA NONRATT013819.2 and Lox mRNA associated with the extracellular matrix (ECM)-related signaling pathway in HSCs and fibrotic livers. Our results detailing a dysregulated lncRNA-mRNA network might provide new treatment strategies for hepatic fibrosis based on findings indicating potentially critical roles for NONRATT013819.2 and Lox in ECM remodeling during HSC activation. © 2017 The Author(s). Published by S. Karger AG, Basel.

  5. Fungal Morphology, Iron Homeostasis, and Lipid Metabolism Regulated by a GATA Transcription Factor in Blastomyces dermatitidis

    PubMed Central

    Marty, Amber J.; Broman, Aimee T.; Zarnowski, Robert; Dwyer, Teigan G.; Bond, Laura M.; Lounes-Hadj Sahraoui, Anissa; Fontaine, Joël; Ntambi, James M.; Keleş, Sündüz; Kendziorski, Christina; Gauthier, Gregory M.

    2015-01-01

    In response to temperature, Blastomyces dermatitidis converts between yeast and mold forms. Knowledge of the mechanism(s) underlying this response to temperature remains limited. In B. dermatitidis, we identified a GATA transcription factor, SREB, important for the transition to mold. Null mutants (SREBΔ) fail to fully complete the conversion to mold and cannot properly regulate siderophore biosynthesis. To capture the transcriptional response regulated by SREB early in the phase transition (0–48 hours), gene expression microarrays were used to compare SREB∆ to an isogenic wild type isolate. Analysis of the time course microarray data demonstrated SREB functioned as a transcriptional regulator at 37°C and 22°C. Bioinformatic and biochemical analyses indicated SREB was involved in diverse biological processes including iron homeostasis, biosynthesis of triacylglycerol and ergosterol, and lipid droplet formation. Integration of microarray data, bioinformatics, and chromatin immunoprecipitation identified a subset of genes directly bound and regulated by SREB in vivo in yeast (37°C) and during the phase transition to mold (22°C). This included genes involved with siderophore biosynthesis and uptake, iron homeostasis, and genes unrelated to iron assimilation. Functional analysis suggested that lipid droplets were actively metabolized during the phase transition and lipid metabolism may contribute to filamentous growth at 22°C. Chromatin immunoprecipitation, RNA interference, and overexpression analyses suggested that SREB was in a negative regulatory circuit with the bZIP transcription factor encoded by HAPX. Both SREB and HAPX affected morphogenesis at 22°C; however, large changes in transcript abundance by gene deletion for SREB or strong overexpression for HAPX were required to alter the phase transition. PMID:26114571

  6. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.

    PubMed

    Janicki, Mateusz; Rooke, Rebecca; Yang, Guojun

    2011-08-01

    A major portion of most eukaryotic genomes are transposable elements (TEs). During evolution, TEs have introduced profound changes to genome size, structure, and function. As integral parts of genomes, the dynamic presence of TEs will continue to be a major force in reshaping genomes. Early computational analyses of TEs in genome sequences focused on filtering out "junk" sequences to facilitate gene annotation. When the high abundance and diversity of TEs in eukaryotic genomes were recognized, these early efforts transformed into the systematic genome-wide categorization and classification of TEs. The availability of genomic sequence data reversed the classical genetic approaches to discovering new TE families and superfamilies. Curated TE databases and their accurate annotation of genome sequences in turn facilitated the studies on TEs in a number of frontiers including: (1) TE-mediated changes of genome size and structure, (2) the influence of TEs on genome and gene functions, (3) TE regulation by host, (4) the evolution of TEs and their population dynamics, and (5) genomic scale studies of TE activity. Bioinformatics and genomic approaches have become an integral part of large-scale studies on TEs to extract information with pure in silico analyses or to assist wet lab experimental studies. The current revolution in genome sequencing technology facilitates further progress in the existing frontiers of research and emergence of new initiatives. The rapid generation of large-sequence datasets at record low costs on a routine basis is challenging the computing industry on storage capacity and manipulation speed and the bioinformatics community for improvement in algorithms and their implementations.

  7. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less

  8. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  9. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  10. SoS Notebook: An Interactive Multi-Language Data Analysis Environment.

    PubMed

    Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N

    2018-05-22

    Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.

  11. Meta-learning framework applied in bioinformatics inference system design.

    PubMed

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  12. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    PubMed Central

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  13. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2017-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

  14. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2018-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101

  15. bioNerDS: exploring bioinformatics’ database and software use through literature mining

    PubMed Central

    2013-01-01

    Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135

  16. Application of virtual phase-shifting speckle-interferometry for detection of polymorphism in the Chlamydia trachomatis omp1 gene

    NASA Astrophysics Data System (ADS)

    Feodorova, Valentina A.; Saltykov, Yury V.; Zaytsev, Sergey S.; Ulyanov, Sergey S.; Ulianova, Onega V.

    2018-04-01

    Method of phase-shifting speckle-interferometry has been used as a new tool with high potency for modern bioinformatics. Virtual phase-shifting speckle-interferometry has been applied for detection of polymorphism in the of Chlamydia trachomatis omp1 gene. It has been shown, that suggested method is very sensitive to natural genetic mutations as single nucleotide polymorphism (SNP). Effectiveness of proposed method has been compared with effectiveness of the newest bioinformatic tools, based on nucleotide sequence alignment.

  17. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  18. Towards muscle-specific meat color stability of Chinese Luxi yellow cattle: A proteomic insight into post-mortem storage.

    PubMed

    Wu, Wei; Yu, Qian-Qian; Fu, Yu; Tian, Xiao-Jing; Jia, Fei; Li, Xing-Min; Dai, Rui-Tong

    2016-09-16

    Searching for potential predictors of meat color is a challenging task for the meat industry. In this study, the relationship between meat color parameters and the sarcoplasmic proteome of M. longissimuss lumborum (LL) and M. psoas major (PM) from Chinese Luxi yellow cattle during post-mortem storage (0, 5, 10 and 15days) were explored with the aid of the integrated proteomics and bioinformatics approaches. Meat color attributes revealed that LL displayed better color stability than PM during storage. Furthermore, sarcoplasmic proteins of these two muscles were compared between days 5, 10, 15 and day 0. Several proteins were closely correlated with meat color attributes and they were muscle-specific and responsible for the meat color stability at different storage periods. Glycerol-3-phosphate dehydrogenase, fructose-bisphosphate aldolase A isoform, glycogen phosphorylase, peroxiredoxin-2, phosphoglucomutase-1, superoxide dismutase [Cu-Zn], heat shock cognate protein (71kDa) might serve as the candidate predictors of meat color stability during post-mortem storage. In addition, bioinformatics analyses indicated that more proteins were involved in glycolytic metabolism of LL, which contributed to better meat color stability of LL than PM. The present results could provide a proteomic insight into muscle-specific meat color stability of Chinese Luxi yellow cattle during post-mortem storage. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Ambiguity and variability of database and software names in bioinformatics.

    PubMed

    Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran

    2015-01-01

    There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.

  20. Analytical criteria for performance characteristics of IgE binding methods for evaluating safety of biotech food products.

    PubMed

    Holzhauser, Thomas; Ree, Ronald van; Poulsen, Lars K; Bannon, Gary A

    2008-10-01

    There is detailed guidance on how to perform bioinformatic analyses and enzymatic degradation studies for genetically modified crops under consideration for approval by regulatory agencies; however, there is no consensus in the scientific community on the details of how to perform IgE serum studies. IgE serum studies are an important safety component to acceptance of genetically modified crops when the introduced protein is novel, the introduced protein is similar to known allergens, or the crop is allergenic. In this manuscript, we describe the characteristics of the reagents, validation of assay performance, and data analysis necessary to optimize the information obtained from serum testing of novel proteins and genetically modified (GM) crops and to make results more accurate and comparable between different investigations.

  1. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  2. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  3. Bioinformatic and Biochemical Characterizations of C–S Bond Formation and Cleavage Enzymes in the Fungus Neurospora crassa Ergothioneine Biosynthetic Pathway

    PubMed Central

    2015-01-01

    Ergothioneine is a histidine thiol derivative. Its mycobacterial biosynthetic pathway has five steps (EgtA-E catalysis) with two novel reactions: a mononuclear nonheme iron enzyme (EgtB) catalyzed oxidative C–S bond formation and a PLP-mediated C–S lyase (EgtE) reaction. Our bioinformatic and biochemical analyses indicate that the fungus Neurospora crassa has a more concise ergothioneine biosynthetic pathway because its nonheme iron enzyme, Egt1, makes use of cysteine instead of γ-Glu-Cys as the substrate. Such a change of substrate preference eliminates the competition between ergothioneine and glutathione biosyntheses. In addition, we have identified the N. crassa C–S lyase (NCU11365) and reconstituted its activity in vitro, which makes the future ergothioneine production through metabolic engineering feasible. PMID:25275953

  4. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  5. Bioinformatics Analysis Reveals Genes Involved in the Pathogenesis of Ameloblastoma and Keratocystic Odontogenic Tumor.

    PubMed

    Santos, Eliane Macedo Sobrinho; Santos, Hércules Otacílio; Dos Santos Dias, Ivoneth; Santos, Sérgio Henrique; Batista de Paula, Alfredo Maurício; Feltenberger, John David; Sena Guimarães, André Luiz; Farias, Lucyana Conceição

    2016-01-01

    Pathogenesis of odontogenic tumors is not well known. It is important to identify genetic deregulations and molecular alterations. This study aimed to investigate, through bioinformatic analysis, the possible genes involved in the pathogenesis of ameloblastoma (AM) and keratocystic odontogenic tumor (KCOT). Genes involved in the pathogenesis of AM and KCOT were identified in GeneCards. Gene list was expanded, and the gene interactions network was mapped using the STRING software. "Weighted number of links" (WNL) was calculated to identify "leader genes" (highest WNL). Genes were ranked by K-means method and Kruskal-Wallis test was used (P<0.001). Total interactions score (TIS) was also calculated using all interaction data generated by the STRING database, in order to achieve global connectivity for each gene. The topological and ontological analyses were performed using Cytoscape software and BinGO plugin. Literature review data was used to corroborate the bioinformatics data. CDK1 was identified as leader gene for AM. In KCOT group, results show PCNA and TP53 . Both tumors exhibit a power law behavior. Our topological analysis suggested leader genes possibly important in the pathogenesis of AM and KCOT, by clustering coefficient calculated for both odontogenic tumors (0.028 for AM, zero for KCOT). The results obtained in the scatter diagram suggest an important relationship of these genes with the molecular processes involved in AM and KCOT. Ontological analysis for both AM and KCOT demonstrated different mechanisms. Bioinformatics analyzes were confirmed through literature review. These results may suggest the involvement of promising genes for a better understanding of the pathogenesis of AM and KCOT.

  6. Identification of SNPs associated with muscle yield and quality traits using allelic-imbalance analysis analyses of pooled RNA-Seq samples in rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Coding/functional SNPs change the biological function of a gene and, therefore, could serve as “large-effect” genetic markers. In this study, we used two bioinformatics pipelines, GATK and SAMtools, for discovering coding/functional SNPs with allelic-imbalances associated with total body weight, mus...

  7. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

    PubMed

    Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

    2013-12-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.

  8. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

    PubMed Central

    Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

    2013-01-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704

  9. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows

    PubMed Central

    2013-01-01

    Background Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses. PMID:23286517

  10. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  11. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  12. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607

  13. MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

    PubMed Central

    2014-01-01

    Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103

  14. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades.

    PubMed

    Wren, Jonathan D

    2016-09-01

    To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  16. StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

    PubMed

    Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh

    2016-01-01

    The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.

  17. A comparison of common programming languages used in bioinformatics.

    PubMed

    Fourment, Mathieu; Gillings, Michael R

    2008-02-05

    The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

  18. Better bioinformatics through usability analysis.

    PubMed

    Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia

    2009-02-01

    Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.

  19. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  20. Distribution of cold adaptation proteins in microbial mats in Lake Joyce, Antarctica: Analysis of metagenomic data by using two bioinformatics tools.

    PubMed

    Koo, Hyunmin; Hakim, Joseph A; Fisher, Phillip R E; Grueneberg, Alexander; Andersen, Dale T; Bej, Asim K

    2016-01-01

    In this study, we report the distribution and abundance of cold-adaptation proteins in microbial mat communities in the perennially ice-covered Lake Joyce, located in the McMurdo Dry Valleys, Antarctica. We have used MG-RAST and R code bioinformatics tools on Illumina HiSeq2000 shotgun metagenomic data and compared the filtering efficacy of these two methods on cold-adaptation proteins. Overall, the abundance of cold-shock DEAD-box protein A (CSDA), antifreeze proteins (AFPs), fatty acid desaturase (FAD), trehalose synthase (TS), and cold-shock family of proteins (CSPs) were present in all mat samples at high, moderate, or low levels, whereas the ice nucleation protein (INP) was present only in the ice and bulbous mat samples at insignificant levels. Considering the near homogeneous temperature profile of Lake Joyce (0.08-0.29 °C), the distribution and abundance of these proteins across various mat samples predictively correlated with known functional attributes necessary for microbial communities to thrive in this ecosystem. The comparison of the MG-RAST and the R code methods showed dissimilar occurrences of the cold-adaptation protein sequences, though with insignificant ANOSIM (R = 0.357; p-value = 0.012), ADONIS (R(2) = 0.274; p-value = 0.03) and STAMP (p-values = 0.521-0.984) statistical analyses. Furthermore, filtering targeted sequences using the R code accounted for taxonomic groups by avoiding sequence redundancies, whereas the MG-RAST provided total counts resulting in a higher sequence output. The results from this study revealed for the first time the distribution of cold-adaptation proteins in six different types of microbial mats in Lake Joyce, while suggesting a simpler and more manageable user-defined method of R code, as compared to a web-based MG-RAST pipeline.

  1. Improved, ACMG-Compliant, in silico prediction of pathogenicity for missense substitutions encoded by TP53 variants.

    PubMed

    Fortuno, Cristina; James, Paul A; Young, Erin L; Feng, Bing; Olivier, Magali; Pesaran, Tina; Tavtigian, Sean V; Spurdle, Amanda B

    2018-05-18

    Clinical interpretation of germline missense variants represents a major challenge, including those in the TP53 Li-Fraumeni syndrome gene. Bioinformatic prediction is a key part of variant classification strategies. We aimed to optimize the performance of the Align-GVGD tool used for p53 missense variant prediction, and compare its performance to other bioinformatic tools (SIFT, PolyPhen-2) and ensemble methods (REVEL, BayesDel). Reference sets of assumed pathogenic and assumed benign variants were defined using functional and/or clinical data. Area under the curve and Matthews correlation coefficient (MCC) values were used as objective functions to select an optimized protein multi-sequence alignment with best performance for Align-GVGD. MCC comparison of tools using binary categories showed optimized Align-GVGD (C15 cut-off) combined with BayesDel (0.16 cut-off), or with REVEL (0.5 cut-off), to have the best overall performance. Further, a semi-quantitative approach using multiple tiers of bioinformatic prediction, validated using an independent set of non-functional and functional variants, supported use of Align-GVGD and BayesDel prediction for different strength of evidence levels in ACMG/AMP rules. We provide rationale for bioinformatic tool selection for TP53 variant classification, and have also computed relevant bioinformatic predictions for every possible p53 missense variant to facilitate their use by the scientific and medical community. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  2. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.

    PubMed

    Yao, Li; Wang, Heming; Song, Yuanyuan; Sui, Guangchao

    2017-10-15

    With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease. BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io. li_yao@outlook.com or gcsui@nefu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

    PubMed

    Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu

    2016-05-01

    High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.

  4. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    PubMed

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  5. Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

    PubMed

    Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

    2015-05-01

    Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  6. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGES

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; ...

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  7. Cyclin D1 and Ewing's sarcoma/PNET: A microarray analysis.

    PubMed

    Fagone, Paolo; Nicoletti, Ferdinando; Salvatorelli, Lucia; Musumeci, Giuseppe; Magro, Gaetano

    2015-10-01

    Recent immunohistochemical analyses have showed that cyclin D1 is expressed in soft tissue Ewing's sarcoma/peripheral neuroectodermal tumor (PNET) of childhood and adolescents, while it is undetectable in both embryonal and alveolar rhabdomyosarcoma. In the present paper, microarray analysis provided evidence of a significant upregulation of cyclin D1 in Ewing's sarcoma as compared to normal tissues. In addition, we confirmed our previous findings of a significant over-expression of cyclin D1 in Ewing sarcoma as compared to rhabdomyosarcoma. Bioinformatic analysis also allowed to identify some other genes, strongly correlated to cyclin D1, which, although not previously studied in pediatric tumors, could represent novel markers for the diagnosis and prognosis of Ewing's sarcoma/PNET. The data herein provided support not only the use of cyclin D1 as a diagnostic marker of Ewing sarcoma/PNET but also the possibility of using drugs targeting cyclin D1 as potential therapeutic strategies. Copyright © 2015 Elsevier GmbH. All rights reserved.

  8. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  9. A review of bioinformatics training applied to research in molecular medicine, agriculture and biodiversity in Costa Rica and Central America.

    PubMed

    Orozco, Allan; Morera, Jessica; Jiménez, Sergio; Boza, Ricardo

    2013-09-01

    Today, Bioinformatics has become a scientific discipline with great relevance for the Molecular Biosciences and for the Omics sciences in general. Although developed countries have progressed with large strides in Bioinformatics education and research, in other regions, such as Central America, the advances have occurred in a gradual way and with little support from the Academia, either at the undergraduate or graduate level. To address this problem, the University of Costa Rica's Medical School, a regional leader in Bioinformatics in Central America, has been conducting a series of Bioinformatics workshops, seminars and courses, leading to the creation of the region's first Bioinformatics Master's Degree. The recent creation of the Central American Bioinformatics Network (BioCANET), associated to the deployment of a supporting computational infrastructure (HPC Cluster) devoted to provide computing support for Molecular Biology in the region, is providing a foundational stone for the development of Bioinformatics in the area. Central American bioinformaticians have participated in the creation of as well as co-founded the Iberoamerican Bioinformatics Society (SOIBIO). In this article, we review the most recent activities in education and research in Bioinformatics from several regional institutions. These activities have resulted in further advances for Molecular Medicine, Agriculture and Biodiversity research in Costa Rica and the rest of the Central American countries. Finally, we provide summary information on the first Central America Bioinformatics International Congress, as well as the creation of the first Bioinformatics company (Indromics Bioinformatics), spin-off the Academy in Central America and the Caribbean.

  10. Bioinformatics and systems biology research update from the 15th International Conference on Bioinformatics (InCoB2016).

    PubMed

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  11. Safety assessment and detection method of genetically modified Chinese Kale (Brassica oleracea cv. alboglabra ).

    PubMed

    Lin, Chih-Hui; Lu, Chien-Te; Lin, Hsin-Tang; Pan, Tzu-Ming

    2009-03-11

    Sporamins are tuberous storage proteins and account for 80% of soluble protein in sweet potato tubers with trypsin-inhibitory activity. The expression of sporamin protein in transgenic Chinese kale (line BoA 3-1) conferred insecticidal activity toward corn earworm [ Helicoverpa armigera (Hubner)] in a previous report. In this study, we present a preliminary safety assessment of transgenic Chinese kale BoA 3-1. Bioinformatic and simulated gastric fluid (SGF) analyses were performed to evaluate the allergenicity of sporamin protein. The substantial equivalence between transgenic Chinese kale and its wild-type host has been demonstrated by the comparison of important constituents. A reliable real-time polymerase chain reaction (PCR) detection method was also developed to control sample quality. Despite the results of most evaluations in this study being negative, the safety of sporamin in transgenic Chinese kale BoA 3-1 was uncluded because of the allergenic risk revealed by bioinformatic analysis.

  12. Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

    PubMed Central

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-01-01

    Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313

  13. Proteomic and N-glycoproteomic quantification reveal aberrant changes in the human saliva of oral ulcer patients.

    PubMed

    Zhang, Ying; Wang, Xi; Cui, Dan; Zhu, Jun

    2016-12-01

    Human whole saliva is a vital body fluid for studying the physiology and pathology of the oral cavity. As a powerful technique for biomarker discovery, MS-based proteomic strategies have been introduced for saliva analysis and identified hundreds of proteins and N-glycosylation sites. However, there is still a lack of quantitative analysis, which is necessary for biomarker screening and biological research. In this study, we establish an integrated workflow by the combination of stable isotope dimethyl labeling, HILIC enrichment, and high resolution MS for both quantification of the global proteome and N-glycoproteome of human saliva from oral ulcer patients. With the help of advanced bioinformatics, we comprehensively studied oral ulcers at both protein and glycoprotein scales. Bioinformatics analyses revealed that starch digestion and protein degradation activities are inhibited while the immune response is promoted in oral ulcer saliva. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Fiat lux! Phylogeny and bioinformatics shed light on GABA functions in plants.

    PubMed

    Renault, Hugues

    2013-06-01

    The non-protein amino acid γ-aminobutyric acid (GABA) accumulates in plants in response to a wide variety of environmental cues. Recent data point toward an involvement of GABA in tricarboxylic acid (TCA) cycle activity and respiration, especially in stressed roots. To gain further insights into potential GABA functions in plants, phylogenetic and bioinformatic approaches were undertaken. Phylogenetic reconstruction of the GABA transaminase (GABA-T) protein family revealed the monophyletic nature of plant GABA-Ts. However, this analysis also pointed to the common origin of several plant aminotransferases families, which were found more similar to plant GABA-Ts than yeast and human GABA-Ts. A computational analysis of AtGABA-T co-expressed genes was performed in roots and in stress conditions. This second approach uncovered a strong connection between GABA metabolism and glyoxylate cycle during stress. Both in silico analyses open new perspectives and hypotheses for GABA metabolic functions in plants.

  15. Genomic big data hitting the storage bottleneck.

    PubMed

    Papageorgiou, Louis; Eleni, Picasi; Raftopoulou, Sofia; Mantaiou, Meropi; Megalooikonomou, Vasileios; Vlachakis, Dimitrios

    2018-01-01

    During the last decades, there is a vast data explosion in bioinformatics. Big data centres are trying to face this data crisis, reaching high storage capacity levels. Although several scientific giants examine how to handle the enormous pile of information in their cupboards, the problem remains unsolved. On a daily basis, there is a massive quantity of permanent loss of extensive information due to infrastructure and storage space problems. The motivation for sequencing has fallen behind. Sometimes, the time that is spent to solve storage space problems is longer than the one dedicated to collect and analyse data. To bring sequencing to the foreground, scientists have to slide over such obstacles and find alternative ways to approach the issue of data volume. Scientific community experiences the data crisis era, where, out of the box solutions may ease the typical research workflow, until technological development meets the needs of Bioinformatics.

  16. Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

    PubMed

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-06-01

    Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.

  17. Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks.

    PubMed

    Van Landeghem, Sofie; Van Parys, Thomas; Dubois, Marieke; Inzé, Dirk; Van de Peer, Yves

    2016-01-05

    Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/.

  18. Evolution of long centromeres in fire ants.

    PubMed

    Huang, Yu-Ching; Lee, Chih-Chi; Kao, Chia-Yi; Chang, Ni-Chen; Lin, Chung-Chi; Shoemaker, DeWayne; Wang, John

    2016-09-15

    Centromeres are essential for accurate chromosome segregation, yet sequence conservation is low even among closely related species. Centromere drive predicts rapid turnover because some centromeric sequences may compete better than others during female meiosis. In addition to sequence composition, longer centromeres may have a transmission advantage. We report the first observations of extremely long centromeres, covering on average 34 % of the chromosomes, in the red imported fire ant Solenopsis invicta. By comparison, cytological examination of Solenopsis geminata revealed typical small centromeric constrictions. Bioinformatics and molecular analyses identified CenSol, the major centromeric satellite DNA repeat. We found that CenSol sequences are very similar between the two species but the CenSol copy number in S. invicta is much greater than that in S. geminata. In addition, centromere expansion in S. invicta is not correlated with the duplication of CenH3. Comparative analyses revealed that several closely related fire ant species also possess long centromeres. Our results are consistent with a model of simple runaway centromere expansion due to centromere drive. We suggest expanded centromeres may be more prevalent in hymenopteran insects, which use haplodiploid sex determination, than previously considered.

  19. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    PubMed Central

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  20. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  1. Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.

    PubMed

    Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying

    2017-01-01

    Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Bioinformatics analysis on molecular mechanism of rheum officinale in treatment of jaundice

    NASA Astrophysics Data System (ADS)

    Shan, Si; Tu, Jun; Nie, Peng; Yan, Xiaojun

    2017-01-01

    Objective: To study the molecular mechanism of Rheum officinale in the treatment of Jaundice by building molecular networks and comparing canonical pathways. Methods: Target proteins of Rheum officinale and related genes of Jaundice were searched from Pubchem and Gene databases online respectively. Molecular networks and canonical pathways comparison analyses were performed by Ingenuity Pathway Analysis (IPA). Results: The molecular networks of Rheum officinale and Jaundice were complex and multifunctional. The 40 target proteins of Rheum officinale and 33 Homo sapiens genes of Jaundice were found in databases. There were 19 common pathways both related networks. Rheum officinale could regulate endothelial differentiation, Interleukin-1B (IL-1B) and Tumor Necrosis Factor (TNF) in these pathways. Conclusions: Rheum officinale treat Jaundice by regulating many effective nodes of Apoptotic pathway and cellular immunity related pathways.

  3. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  4. BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.

    PubMed

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis

    2014-11-15

    The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Correspondence regarding Zhong et al., BMC Bioinformatics 2013 Mar 7;14:89.

    PubMed

    Kuhn, Alexandre

    2014-11-28

    Computational expression deconvolution aims to estimate the contribution of individual cell populations to expression profiles measured in samples of heterogeneous composition. Zhong et al. recently proposed Digital Sorting Algorithm (BMC Bioinformatics 2013 Mar 7;14:89) and showed that they could accurately estimate population-specific expression levels and expression differences between two populations. They compared DSA with Population-Specific Expression Analysis (PSEA), a previous deconvolution method that we developed to detect expression changes occurring within the same population between two conditions (e.g. disease versus non-disease). However, Zhong et al. compared PSEA-derived specific expression levels across different cell populations. Specific expression levels obtained with PSEA cannot be directly compared across different populations as they are on a relative scale. They are accurate as we demonstrate by deconvolving the same dataset used by Zhong et al. and, importantly, allow for comparison of population-specific expression across conditions.

  6. The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes.

    PubMed

    Sanitá Lima, Matheus; Woods, Laura C; Cartwright, Matthew W; Smith, David Roy

    2016-11-01

    Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  7. Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation

    PubMed Central

    González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S.

    2016-01-01

    Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia. These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e−5. None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus, making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD). PMID:28082953

  8. Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation.

    PubMed

    González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S

    2016-01-01

    Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia . These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e -5 . None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus , making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD).

  9. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program

    PubMed Central

    Rein, Diane C.

    2006-01-01

    Setting: Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. Program Components: After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. Evaluation Mechanisms: The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Next Steps/Future Directions: Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making. PMID:16888666

  10. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program.

    PubMed

    Rein, Diane C

    2006-07-01

    Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making.

  11. LegumeDB1 bioinformatics resource: comparative genomic analysis and novel cross-genera marker identification in lupin and pasture legume species.

    PubMed

    Moolhuijzen, P; Cakir, M; Hunter, A; Schibeci, D; Macgregor, A; Smith, C; Francki, M; Jones, M G K; Appels, R; Bellgard, M

    2006-06-01

    The identification of markers in legume pasture crops, which can be associated with traits such as protein and lipid production, disease resistance, and reduced pod shattering, is generally accepted as an important strategy for improving the agronomic performance of these crops. It has been demonstrated that many quantitative trait loci (QTLs) identified in one species can be found in other plant species. Detailed legume comparative genomic analyses can characterize the genome organization between model legume species (e.g., Medicago truncatula, Lotus japonicus) and economically important crops such as soybean (Glycine max), pea (Pisum sativum), chickpea (Cicer arietinum), and lupin (Lupinus angustifolius), thereby identifying candidate gene markers that can be used to track QTLs in lupin and pasture legume breeding. LegumeDB is a Web-based bioinformatics resource for legume researchers. LegumeDB analysis of Medicago truncatula expressed sequence tags (ESTs) has identified novel simple sequence repeat (SSR) markers (16 tested), some of which have been putatively linked to symbiosome membrane proteins in root nodules and cell-wall proteins important in plant-pathogen defence mechanisms. These novel markers by preliminary PCR assays have been detected in Medicago truncatula and detected in at least one other legume species, Lotus japonicus, Glycine max, Cicer arietinum, and (or) Lupinus angustifolius (15/16 tested). Ongoing research has validated some of these markers to map them in a range of legume species that can then be used to compile composite genetic and physical maps. In this paper, we outline the features and capabilities of LegumeDB as an interactive application that provides legume genetic and physical comparative maps, and the efficient feature identification and annotation of the vast tracks of model legume sequences for convenient data integration and visualization. LegumeDB has been used to identify potential novel cross-genera polymorphic legume markers that map to agronomic traits, supporting the accelerated identification of molecular genetic factors underpinning important agronomic attributes in lupin.

  12. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

    PubMed

    Li, Sanshu; Breaker, Ronald R

    2017-10-13

    With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.

  13. Prospective Molecular Profiling of Canine Cancers Provides a Clinically Relevant Comparative Model for Evaluating Personalized Medicine (PMed) Trials

    PubMed Central

    Mazcko, Christina; Cherba, David; Hendricks, William; Lana, Susan; Ehrhart, E. J.; Charles, Brad; Fehling, Heather; Kumar, Leena; Vail, David; Henson, Michael; Childress, Michael; Kitchell, Barbara; Kingsley, Christopher; Kim, Seungchan; Neff, Mark; Davis, Barbara

    2014-01-01

    Background Molecularly-guided trials (i.e. PMed) now seek to aid clinical decision-making by matching cancer targets with therapeutic options. Progress has been hampered by the lack of cancer models that account for individual-to-individual heterogeneity within and across cancer types. Naturally occurring cancers in pet animals are heterogeneous and thus provide an opportunity to answer questions about these PMed strategies and optimize translation to human patients. In order to realize this opportunity, it is now necessary to demonstrate the feasibility of conducting molecularly-guided analysis of tumors from dogs with naturally occurring cancer in a clinically relevant setting. Methodology A proof-of-concept study was conducted by the Comparative Oncology Trials Consortium (COTC) to determine if tumor collection, prospective molecular profiling, and PMed report generation within 1 week was feasible in dogs. Thirty-one dogs with cancers of varying histologies were enrolled. Twenty-four of 31 samples (77%) successfully met all predefined QA/QC criteria and were analyzed via Affymetrix gene expression profiling. A subsequent bioinformatics workflow transformed genomic data into a personalized drug report. Average turnaround from biopsy to report generation was 116 hours (4.8 days). Unsupervised clustering of canine tumor expression data clustered by cancer type, but supervised clustering of tumors based on the personalized drug report clustered by drug class rather than cancer type. Conclusions Collection and turnaround of high quality canine tumor samples, centralized pathology, analyte generation, array hybridization, and bioinformatic analyses matching gene expression to therapeutic options is achievable in a practical clinical window (<1 week). Clustering data show robust signatures by cancer type but also showed patient-to-patient heterogeneity in drug predictions. This lends further support to the inclusion of a heterogeneous population of dogs with cancer into the preclinical modeling of personalized medicine. Future comparative oncology studies optimizing the delivery of PMed strategies may aid cancer drug development. PMID:24637659

  14. Prospective molecular profiling of canine cancers provides a clinically relevant comparative model for evaluating personalized medicine (PMed) trials.

    PubMed

    Paoloni, Melissa; Webb, Craig; Mazcko, Christina; Cherba, David; Hendricks, William; Lana, Susan; Ehrhart, E J; Charles, Brad; Fehling, Heather; Kumar, Leena; Vail, David; Henson, Michael; Childress, Michael; Kitchell, Barbara; Kingsley, Christopher; Kim, Seungchan; Neff, Mark; Davis, Barbara; Khanna, Chand; Trent, Jeffrey

    2014-01-01

    Molecularly-guided trials (i.e. PMed) now seek to aid clinical decision-making by matching cancer targets with therapeutic options. Progress has been hampered by the lack of cancer models that account for individual-to-individual heterogeneity within and across cancer types. Naturally occurring cancers in pet animals are heterogeneous and thus provide an opportunity to answer questions about these PMed strategies and optimize translation to human patients. In order to realize this opportunity, it is now necessary to demonstrate the feasibility of conducting molecularly-guided analysis of tumors from dogs with naturally occurring cancer in a clinically relevant setting. A proof-of-concept study was conducted by the Comparative Oncology Trials Consortium (COTC) to determine if tumor collection, prospective molecular profiling, and PMed report generation within 1 week was feasible in dogs. Thirty-one dogs with cancers of varying histologies were enrolled. Twenty-four of 31 samples (77%) successfully met all predefined QA/QC criteria and were analyzed via Affymetrix gene expression profiling. A subsequent bioinformatics workflow transformed genomic data into a personalized drug report. Average turnaround from biopsy to report generation was 116 hours (4.8 days). Unsupervised clustering of canine tumor expression data clustered by cancer type, but supervised clustering of tumors based on the personalized drug report clustered by drug class rather than cancer type. Collection and turnaround of high quality canine tumor samples, centralized pathology, analyte generation, array hybridization, and bioinformatic analyses matching gene expression to therapeutic options is achievable in a practical clinical window (<1 week). Clustering data show robust signatures by cancer type but also showed patient-to-patient heterogeneity in drug predictions. This lends further support to the inclusion of a heterogeneous population of dogs with cancer into the preclinical modeling of personalized medicine. Future comparative oncology studies optimizing the delivery of PMed strategies may aid cancer drug development.

  15. miRNA Temporal Analyzer (mirnaTA): a bioinformatics tool for identifying differentially expressed microRNAs in temporal studies using normal quantile transformation.

    PubMed

    Cer, Regina Z; Herrera-Galeano, J Enrique; Anderson, Joseph J; Bishop-Lilly, Kimberly A; Mokashi, Vishwesh P

    2014-01-01

    Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one's own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts. We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods. mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.

  16. College of American Pathologists' laboratory standards for next-generation sequencing clinical tests.

    PubMed

    Aziz, Nazneen; Zhao, Qin; Bry, Lynn; Driscoll, Denise K; Funke, Birgit; Gibson, Jane S; Grody, Wayne W; Hegde, Madhuri R; Hoeltge, Gerald A; Leonard, Debra G B; Merker, Jason D; Nagarajan, Rakesh; Palicki, Linda A; Robetorye, Ryan S; Schrijver, Iris; Weck, Karen E; Voelkerding, Karl V

    2015-04-01

    The higher throughput and lower per-base cost of next-generation sequencing (NGS) as compared to Sanger sequencing has led to its rapid adoption in clinical testing. The number of laboratories offering NGS-based tests has also grown considerably in the past few years, despite the fact that specific Clinical Laboratory Improvement Amendments of 1988/College of American Pathologists (CAP) laboratory standards had not yet been developed to regulate this technology. To develop a checklist for clinical testing using NGS technology that sets standards for the analytic wet bench process and for bioinformatics or "dry bench" analyses. As NGS-based clinical tests are new to diagnostic testing and are of much greater complexity than traditional Sanger sequencing-based tests, there is an urgent need to develop new regulatory standards for laboratories offering these tests. To develop the necessary regulatory framework for NGS and to facilitate appropriate adoption of this technology for clinical testing, CAP formed a committee in 2011, the NGS Work Group, to deliberate upon the contents to be included in the checklist. Results . -A total of 18 laboratory accreditation checklist requirements for the analytic wet bench process and bioinformatics analysis processes have been included within CAP's molecular pathology checklist (MOL). This report describes the important issues considered by the CAP committee during the development of the new checklist requirements, which address documentation, validation, quality assurance, confirmatory testing, exception logs, monitoring of upgrades, variant interpretation and reporting, incidental findings, data storage, version traceability, and data transfer confidentiality.

  17. Altered Molecular Expression of the TLR4/NF-κB Signaling Pathway in Mammary Tissue of Chinese Holstein Cattle with Mastitis

    PubMed Central

    Wu, Jie; Li, Lian; Sun, Yu; Huang, Shuai; Tang, Juan; Yu, Pan; Wang, Genlin

    2015-01-01

    Toll-like receptor 4 (TLR4) mediated activation of the nuclear transcription factor κB (NF-κB) signaling pathway by mastitis initiates expression of genes associated with inflammation and the innate immune response. In this study, the profile of mastitis-induced differential gene expression in the mammary tissue of Chinese Holstein cattle was investigated by Gene-Chip microarray and bioinformatics. The microarray results revealed that 79 genes associated with the TLR4/NF-κB signaling pathway were differentially expressed. Of these genes, 19 were up-regulated and 29 were down-regulated in mastitis tissue compared to normal, healthy tissue. Statistical analysis of transcript and protein level expression changes indicated that 10 genes, namely TLR4, MyD88, IL-6, and IL-10, were up-regulated, while, CD14, TNF-α, MD-2, IL-β, NF-κB, and IL-12 were significantly down-regulated in mastitis tissue in comparison with normal tissue. Analyses using bioinformatics database resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology Consortium (GO) for term enrichment analysis, suggested that these differently expressed genes implicate different regulatory pathways for immune function in the mammary gland. In conclusion, our study provides new evidence for better understanding the differential expression and mechanisms of the TLR4 /NF-κB signaling pathway in Chinese Holstein cattle with mastitis. PMID:25706977

  18. Altered molecular expression of the TLR4/NF-κB signaling pathway in mammary tissue of Chinese Holstein cattle with mastitis.

    PubMed

    Wu, Jie; Li, Lian; Sun, Yu; Huang, Shuai; Tang, Juan; Yu, Pan; Wang, Genlin

    2015-01-01

    Toll-like receptor 4 (TLR4) mediated activation of the nuclear transcription factor κB (NF-κB) signaling pathway by mastitis initiates expression of genes associated with inflammation and the innate immune response. In this study, the profile of mastitis-induced differential gene expression in the mammary tissue of Chinese Holstein cattle was investigated by Gene-Chip microarray and bioinformatics. The microarray results revealed that 79 genes associated with the TLR4/NF-κB signaling pathway were differentially expressed. Of these genes, 19 were up-regulated and 29 were down-regulated in mastitis tissue compared to normal, healthy tissue. Statistical analysis of transcript and protein level expression changes indicated that 10 genes, namely TLR4, MyD88, IL-6, and IL-10, were up-regulated, while, CD14, TNF-α, MD-2, IL-β, NF-κB, and IL-12 were significantly down-regulated in mastitis tissue in comparison with normal tissue. Analyses using bioinformatics database resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology Consortium (GO) for term enrichment analysis, suggested that these differently expressed genes implicate different regulatory pathways for immune function in the mammary gland. In conclusion, our study provides new evidence for better understanding the differential expression and mechanisms of the TLR4 /NF-κB signaling pathway in Chinese Holstein cattle with mastitis.

  19. 'Isotopo' a database application for facile analysis and management of mass isotopomer data.

    PubMed

    Ahmed, Zeeshan; Zeeshan, Saman; Huber, Claudia; Hensel, Michael; Schomburg, Dietmar; Münch, Richard; Eylert, Eva; Eisenreich, Wolfgang; Dandekar, Thomas

    2014-01-01

    The composition of stable-isotope labelled isotopologues/isotopomers in metabolic products can be measured by mass spectrometry and supports the analysis of pathways and fluxes. As a prerequisite, the original mass spectra have to be processed, managed and stored to rapidly calculate, analyse and compare isotopomer enrichments to study, for instance, bacterial metabolism in infection. For such applications, we provide here the database application 'Isotopo'. This software package includes (i) a database to store and process isotopomer data, (ii) a parser to upload and translate different data formats for such data and (iii) an improved application to process and convert signal intensities from mass spectra of (13)C-labelled metabolites such as tertbutyldimethylsilyl-derivatives of amino acids. Relative mass intensities and isotopomer distributions are calculated applying a partial least square method with iterative refinement for high precision data. The data output includes formats such as graphs for overall enrichments in amino acids. The package is user-friendly for easy and robust data management of multiple experiments. The 'Isotopo' software is available at the following web link (section Download): http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. The package contains three additional files: software executable setup (installer), one data set file (discussed in this article) and one excel file (which can be used to convert data from excel to '.iso' format). The 'Isotopo' software is compatible only with the Microsoft Windows operating system. http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. © The Author(s) 2014. Published by Oxford University Press.

  20. Genomewide effects of peroxisome proliferator-activated receptor gamma in macrophages and dendritic cells--revealing complexity through systems biology.

    PubMed

    Cuaranta-Monroy, Ixchelt; Kiss, Mate; Simandi, Zoltan; Nagy, Laszlo

    2015-09-01

    Systems biology approaches have become indispensable tools in biomedical and basic research. These data integrating bioinformatic methods gained prominence after high-throughput technologies became available to investigate complex cellular processes, such as transcriptional regulation and protein-protein interactions, on a scale that had not been studied before. Immunology is one of the medical fields that systems biology impacted profoundly due to the plasticity of cell types involved and the accessibility of a wide range of experimental models. In this review, we summarize the most important recent genomewide studies exploring the function of peroxisome proliferator-activated receptor γ in macrophages and dendritic cells. PPARγ ChIP-seq experiments were performed in adipocytes derived from embryonic stem cells to complement the existing data sets and to provide comparators to macrophage data. Finally, lists of regulated genes generated from such experiments were analysed with bioinformatics and system biology approaches. We show that genomewide studies utilizing high-throughput data acquisition methods made it possible to gain deeper insights into the role of PPARγ in these immune cell types. We also demonstrate that analysis and visualization of data using network-based approaches can be used to identify novel genes and functions regulated by the receptor. The example of PPARγ in macrophages and dendritic cells highlights the crucial importance of systems biology approaches in establishing novel cellular functions for long-known signaling pathways. © 2015 Stichting European Society for Clinical Investigation Journal Foundation.

  1. Proteomic profiling of early degenerative retina of RCS rats.

    PubMed

    Zhu, Zhi-Hong; Fu, Yan; Weng, Chuan-Huang; Zhao, Cong-Jian; Yin, Zheng-Qin

    2017-01-01

    To identify the underlying cellular and molecular changes in retinitis pigmentosa (RP). Label-free quantification-based proteomics analysis, with its advantages of being more economic and consisting of simpler procedures, has been used with increasing frequency in modern biological research. Dystrophic RCS rats, the first laboratory animal model for the study of RP, possess a similar pathological course as human beings with the diseases. Thus, we employed a comparative proteomics analysis approach for in-depth proteome profiling of retinas from dystrophic RCS rats and non-dystrophic congenic controls through Linear Trap Quadrupole - orbitrap MS/MS, to identify the significant differentially expressed proteins (DEPs). Bioinformatics analyses, including Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation and upstream regulatory analysis, were then performed on these retina proteins. Finally, a Western blotting experiment was carried out to verify the difference in the abundance of transcript factor E2F1. In this study, we identified a total of 2375 protein groups from the retinal protein samples of RCS rats and non-dystrophic congenic controls. Four hundred thirty-four significantly DEPs were selected by Student's t -test. Based on the results of the bioinformatics analysis, we identified mitochondrial dysfunction and transcription factor E2F1 as the key initiation factors in early retinal degenerative process. We showed that the mitochondrial dysfunction and the transcription factor E2F1 substantially contribute to the disease etiology of RP. The results provide a new potential therapeutic approach for this retinal degenerative disease.

  2. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.

    PubMed

    Wightman, Bruce; Hark, Amy T

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.

  3. Identification of host cellular proteins that interact with the M protein of a highly pathogenic porcine reproductive and respiratory syndrome virus vaccine strain.

    PubMed

    Wang, Qian; Li, Yanwei; Dong, Hong; Wang, Li; Peng, Jinmei; An, Tongqing; Yang, Xufu; Tian, Zhijun; Cai, Xuehui

    2017-02-22

    The highly pathogenic porcine reproductive and respiratory syndrome virus (HP-PRRSV) continues to pose one of the greatest threats to the swine industry. M protein is the most conserved and important structural protein of PRRSV. However, information about the host cellular proteins that interact with M protein remains limited. Host cellular proteins that interact with the M protein of HP-PRRSV were immunoprecipitated from MARC-145 cells infected with PRRSV HuN4-F112 using the M monoclonal antibody (mAb). The differentially expressed proteins were identified by LC-MS/MS. The screened proteins were used for bioinformatics analysis including Gene Ontology, the interaction network, and the enriched KEGG pathways. Some interested cellular proteins were validated to interact with M protein by CO-IP. The PRRSV HuN4-F112 infection group had 10 bands compared with the control group. The bands included 219 non-redundant cellular proteins that interact with M protein, which were identified by LC-MS/MS with high confidence. The gene ontology and Kyoto encyclopedia of genes and genomes (KEGG) pathway bioinformatic analyses indicated that the identified proteins could be assigned to several different subcellular locations and functional classes. Functional analysis of the interactome profile highlighted cellular pathways associated with protein translation, infectious disease, and signal transduction. Two interested cellular proteins-nuclear factor of activated T cells 45 kDa (NF45) and proliferating cell nuclear antigen (PCNA)-that could interact with M protein were validated by Co-IP and confocal analyses. The interactome data between PRRSV M protein and cellular proteins were identified and contribute to the understanding of the roles of M protein in the replication and pathogenesis of PRRSV. The interactome of M protein will aid studies of virus/host interactions and provide means to decrease the threat of PRRSV to the swine industry in the future.

  4. Identification of key genes and pathways associated with neuropathic pain in uninjured dorsal root ganglion by using bioinformatic analysis.

    PubMed

    Chen, Chao-Jin; Liu, De-Zhao; Yao, Wei-Feng; Gu, Yu; Huang, Fei; Hei, Zi-Qing; Li, Xiang

    2017-01-01

    Neuropathic pain is a complex chronic condition occurring post-nervous system damage. The transcriptional reprogramming of injured dorsal root ganglia (DRGs) drives neuropathic pain. However, few comparative analyses using high-throughput platforms have investigated uninjured DRG in neuropathic pain, and potential interactions among differentially expressed genes (DEGs) and pathways were not taken into consideration. The aim of this study was to identify changes in genes and pathways associated with neuropathic pain in uninjured L4 DRG after L5 spinal nerve ligation (SNL) by using bioinformatic analysis. The microarray profile GSE24982 was downloaded from the Gene Expression Omnibus database to identify DEGs between DRGs in SNL and sham rats. The prioritization for these DEGs was performed using the Toppgene database followed by gene ontology and pathway enrichment analyses. The relationships among DEGs from the protein interactive perspective were analyzed using protein-protein interaction (PPI) network and module analysis. Real-time polymerase chain reaction (PCR) and Western blotting were used to confirm the expression of DEGs in the rodent neuropathic pain model. A total of 206 DEGs that might play a role in neuropathic pain were identified in L4 DRG, of which 75 were upregulated and 131 were downregulated. The upregulated DEGs were enriched in biological processes related to transcription regulation and molecular functions such as DNA binding, cell cycle, and the FoxO signaling pathway. Ctnnb1 protein had the highest connectivity degrees in the PPI network. The in vivo studies also validated that mRNA and protein levels of Ctnnb1 were upregulated in both L4 and L5 DRGs. This study provides insight into the functional gene sets and pathways associated with neuropathic pain in L4 uninjured DRG after L5 SNL, which might promote our understanding of the molecular mechanisms underlying the development of neuropathic pain.

  5. The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.

    PubMed

    Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi

    2005-04-15

    Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.

  6. Molecular Genetic Characterization of Terreic Acid Pathway in Aspergillus terreus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guo, Chun-Jun; Sun, Wei-wen; Bruno, Kenneth S.

    Terreic acid is a natural product derived from 6-methylsalicylic acid (6-MSA). A compact gene cluster for its biosynthesis was characterized. Isolation of the intermediates and shunt products from the mutant strains, in combined with bioinformatic analyses, allowed us to propose a biosynthetic pathway for terreic acid. Lastly, defining the pathway and the genes involved will facilitate the engineering of this molecule with interesting antimicrobial and antitumor bioactivities.

  7. Molecular Genetic Characterization of Terreic Acid Pathway in Aspergillus terreus

    DOE PAGES

    Guo, Chun-Jun; Sun, Wei-wen; Bruno, Kenneth S.; ...

    2014-09-29

    Terreic acid is a natural product derived from 6-methylsalicylic acid (6-MSA). A compact gene cluster for its biosynthesis was characterized. Isolation of the intermediates and shunt products from the mutant strains, in combined with bioinformatic analyses, allowed us to propose a biosynthetic pathway for terreic acid. Lastly, defining the pathway and the genes involved will facilitate the engineering of this molecule with interesting antimicrobial and antitumor bioactivities.

  8. Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

    PubMed Central

    Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

    2017-01-01

    Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161

  9. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine.

    PubMed

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-16

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM's diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients' target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ's cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the "multi-component, multi-target and multi-pathway" combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM's molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  10. Identifying intrinsically disordered protein regions likely to undergo binding-induced helical transitions.

    PubMed

    Glover, Karen; Mei, Yang; Sinha, Sangita C

    2016-10-01

    Many proteins contain intrinsically disordered regions (IDRs) lacking stable secondary and ordered tertiary structure. IDRs are often implicated in macromolecular interactions, and may undergo structural transitions upon binding to interaction partners. However, as binding partners of many protein IDRs are unknown, these structural transitions are difficult to verify and often are poorly understood. In this study we describe a method to identify IDRs that are likely to undergo helical transitions upon binding. This method combines bioinformatics analyses followed by circular dichroism spectroscopy to monitor 2,2,2-trifluoroethanol (TFE)-induced changes in secondary structure content of these IDRs. Our results demonstrate that there is no significant change in the helicity of IDRs that are not predicted to fold upon binding. IDRs that are predicted to fold fall into two groups: one group does not become helical in the presence of TFE and includes examples of IDRs that form β-strands upon binding, while the other group becomes more helical and includes examples that are known to fold into helices upon binding. Therefore, we propose that bioinformatics analyses combined with experimental evaluation using TFE may provide a general method to identify IDRs that undergo binding-induced disorder-to-helix transitions. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies.

    PubMed

    de Brevern, Alexandre G; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.

  12. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

    PubMed Central

    de Brevern, Alexandre G.; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. PMID:26125026

  13. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  14. Molecular Signatures of Membrane Protein Complexes Underlying Muscular Dystrophy*

    PubMed Central

    Turk, Rolf; Hsiao, Jordy J.; Smits, Melinda M.; Ng, Brandon H.; Pospisil, Tyler C.; Jones, Kayla S.; Campbell, Kevin P.; Wright, Michael E.

    2016-01-01

    Mutations in genes encoding components of the sarcolemmal dystrophin-glycoprotein complex (DGC) are responsible for a large number of muscular dystrophies. As such, molecular dissection of the DGC is expected to both reveal pathological mechanisms, and provides a biological framework for validating new DGC components. Establishment of the molecular composition of plasma-membrane protein complexes has been hampered by a lack of suitable biochemical approaches. Here we present an analytical workflow based upon the principles of protein correlation profiling that has enabled us to model the molecular composition of the DGC in mouse skeletal muscle. We also report our analysis of protein complexes in mice harboring mutations in DGC components. Bioinformatic analyses suggested that cell-adhesion pathways were under the transcriptional control of NFκB in DGC mutant mice, which is a finding that is supported by previous studies that showed NFκB-regulated pathways underlie the pathophysiology of DGC-related muscular dystrophies. Moreover, the bioinformatic analyses suggested that inflammatory and compensatory mechanisms were activated in skeletal muscle of DGC mutant mice. Additionally, this proteomic study provides a molecular framework to refine our understanding of the DGC, identification of protein biomarkers of neuromuscular disease, and pharmacological interrogation of the DGC in adult skeletal muscle https://www.mda.org/disease/congenital-muscular-dystrophy/research. PMID:27099343

  15. clubber: removing the bioinformatics bottleneck in big data analyses.

    PubMed

    Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana

    2017-06-13

    With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these "big data" analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber's goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.

  16. clubber: removing the bioinformatics bottleneck in big data analyses

    PubMed Central

    Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana

    2018-01-01

    With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment. PMID:28609295

  17. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers

    PubMed Central

    Brazas, Michelle D.; Ouellette, B. F. Francis

    2016-01-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression. PMID:27281025

  18. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  19. Bioinformatics research in the Asia Pacific: a 2007 update.

    PubMed

    Ranganathan, Shoba; Gribskov, Michael; Tan, Tin Wee

    2008-01-01

    We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27-30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.

  20. Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii

    PubMed Central

    De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A.; Smits, Theo H. M.; Venter, Stephanus N.; Coutinho, Teresa A.

    2017-01-01

    Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart’s wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes, has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis. While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range. PMID:28959245

  1. RNA-Seq Reveals Infection-Induced Gene Expression Changes in the Snail Intermediate Host of the Carcinogenic Liver Fluke, Opisthorchis viverrini

    PubMed Central

    Prasopdee, Sattrachai; Sotillo, Javier; Tesana, Smarn; Laha, Thewarach; Kulsantiwong, Jutharat; Nolan, Matthew J.

    2014-01-01

    Background Bithynia siamensis goniomphalos is the snail intermediate host of the liver fluke, Opisthorchis viverrini, the leading cause of cholangiocarcinoma (CCA) in the Greater Mekong sub-region of Thailand. Despite the severe public health impact of Opisthorchis-induced CCA, knowledge of the molecular interactions occurring between the parasite and its snail intermediate host is scant. The examination of differences in gene expression profiling between uninfected and O. viverrini-infected B. siamensis goniomphalos could provide clues on fundamental pathways involved in the regulation of snail-parasite interplay. Methodology/Principal Findings Using high-throughput (Illumina) sequencing and extensive bioinformatic analyses, we characterized the transcriptomes of uninfected and O. viverrini-infected B. siamensis goniomphalos. Comparative analyses of gene expression profiling allowed the identification of 7,655 differentially expressed genes (DEGs), associated to 43 distinct biological pathways, including pathways associated with immune defense mechanisms against parasites. Amongst the DEGs with immune functions, transcripts encoding distinct proteases displayed the highest down-regulation in Bithynia specimens infected by O. viverrini; conversely, transcription of genes encoding heat-shock proteins and actins was significantly up-regulated in parasite-infected snails when compared to the uninfected counterparts. Conclusions/Significance The present study lays the foundation for functional studies of genes and gene products potentially involved in immune-molecular mechanisms implicated in the ability of the parasite to successfully colonize its snail intermediate host. The annotated dataset provided herein represents a ready-to-use molecular resource for the discovery of molecular pathways underlying susceptibility and resistance mechanisms of B. siamensis goniomphalos to O. viverrini and for comparative analyses with pulmonate snail intermediate hosts of other platyhelminths including schistosomes. PMID:24676090

  2. Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii.

    PubMed

    De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A; Smits, Theo H M; Venter, Stephanus N; Coutinho, Teresa A

    2017-01-01

    Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart's wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes , has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis . While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range.

  3. Emerging strengths in Asia Pacific bioinformatics.

    PubMed

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-12-12

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20-23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.

  4. Emerging strengths in Asia Pacific bioinformatics

    PubMed Central

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-01-01

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008

  5. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  6. Prospects and limitations of full-text index structures in genome analysis

    PubMed Central

    Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter

    2012-01-01

    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621

  7. Extending Asia Pacific bioinformatics into new realms in the "-omics" era.

    PubMed

    Ranganathan, Shoba; Eisenhaber, Frank; Tong, Joo Chuan; Tan, Tin Wee

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.

  8. Bioinformatics approach for choosing the correct reference genes when studying gene expression in human keratinocytes.

    PubMed

    Beer, Lucian; Mlitz, Veronika; Gschwandtner, Maria; Berger, Tanja; Narzt, Marie-Sophie; Gruber, Florian; Brunner, Patrick M; Tschachler, Erwin; Mildner, Michael

    2015-10-01

    Reverse transcription polymerase chain reaction (qRT-PCR) has become a mainstay in many areas of skin research. To enable quantitative analysis, it is necessary to analyse expression of reference genes (RGs) for normalization of target gene expression. The selection of reliable RGs therefore has an important impact on the experimental outcome. In this study, we aimed to identify and validate the best suited RGs for qRT-PCR in human primary keratinocytes (KCs) over a broad range of experimental conditions using the novel bioinformatics tool 'RefGenes', which is based on a manually curated database of published microarray data. Expression of 6 RGs identified by RefGenes software and 12 commonly used RGs were validated by qRT-PCR. We assessed whether these 18 markers fulfilled the requirements for a valid RG by the comprehensive ranking of four bioinformatics tools and the coefficient of variation (CV). In an overall ranking, we found GUSB to be the most stably expressed RG, whereas the expression values of the commonly used RGs, GAPDH and B2M were significantly affected by varying experimental conditions. Our results identify RefGenes as a powerful tool for the identification of valid RGs and suggest GUSB as the most reliable RG for KCs. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

    PubMed Central

    Naccache, Samia N.; Federman, Scot; Veeraraghavan, Narayanan; Zaharia, Matei; Lee, Deanna; Samayoa, Erik; Bouquet, Jerome; Greninger, Alexander L.; Luk, Ka-Cheung; Enge, Barryett; Wadford, Debra A.; Messenger, Sharon L.; Genrich, Gillian L.; Pellegrino, Kristen; Grard, Gilda; Leroy, Eric; Schneider, Bradley S.; Fair, Joseph N.; Martínez, Miguel A.; Isa, Pavel; Crump, John A.; DeRisi, Joseph L.; Sittler, Taylor; Hackett, John; Miller, Steve; Chiu, Charles Y.

    2014-01-01

    Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times. PMID:24899342

  10. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology⋆

    PubMed Central

    Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao

    2009-01-01

    Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650

  11. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.

    PubMed

    Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).

  12. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase.

    PubMed

    Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong

    2005-01-01

    Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.

  13. snpAD: An ancient DNA genotype caller.

    PubMed

    Prüfer, Kay

    2018-06-21

    The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.

  14. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.

    PubMed

    James, Katherine; Cockell, Simon J; Zenkin, Nikolay

    2017-05-01

    The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene

    PubMed Central

    Firth, Andrew E

    2008-01-01

    Background The genus Orbivirus includes several species that infect livestock – including Bluetongue virus (BTV) and African horse sickness virus (AHSV). These viruses have linear dsRNA genomes divided into ten segments, all of which have previously been assumed to be monocistronic. Results Bioinformatic evidence is presented for a short overlapping coding sequence (CDS) in the Orbivirus genome segment 9, overlapping the VP6 cistron in the +1 reading frame. In BTV, a 77–79 codon AUG-initiated open reading frame (hereafter ORFX) is present in all 48 segment 9 sequences analysed. The pattern of base variations across the 48-sequence alignment indicates that ORFX is subject to functional constraints at the amino acid level (even when the constraints due to coding in the overlapping VP6 reading frame are taken into account; MLOGD software). In fact the translated ORFX shows greater amino acid conservation than the overlapping region of VP6. The ORFX AUG codon has a strong Kozak context in all 48 sequences. Each has only one or two upstream AUG codons, always in the VP6 reading frame, and (with a single exception) always with weak or medium Kozak context. Thus, in BTV, ORFX may be translated via leaky scanning. A long (83–169 codon) ORF is present in a corresponding location and reading frame in all other Orbivirus species analysed except Saint Croix River virus (SCRV; the most divergent). Again, the pattern of base variations across sequence alignments indicates multiple coding in the VP6 and ORFX reading frames. Conclusion At ~9.5 kDa, the putative ORFX product in BTV is too small to appear on most published protein gels. Nonetheless, a review of past literature reveals a number of possible detections. We hope that presentation of this bioinformatic analysis will stimulate an attempt to experimentally verify the expression and functional role of ORFX, and hence lead to a greater understanding of the molecular biology of these important pathogens. PMID:18489030

  16. Edge Bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  17. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

    PubMed

    Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong

    2013-12-01

    To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station.

    PubMed

    Rettig, Trisha A; Ward, Claire; Pecaut, Michael J; Chapes, Stephen K

    2017-07-01

    Spaceflight is known to affect immune cell populations. In particular, splenic B cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after space flight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. This included assessments of our bioinformatic workflow on Illumina HiSeq and MiSeq datasets and is specifically designed to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options. We validated our workflow by comparing our normal mouse MiSeq data to existing murine antibody repertoire studies validating it for future antibody repertoire studies.

  19. Bioinformatics goes back to the future.

    PubMed

    Miller, Crispin J; Attwood, Teresa K

    2003-02-01

    The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.

  20. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  1. Design and Implementation of an Interdepartmental Bioinformatics Program across Life Science Curricula

    ERIC Educational Resources Information Center

    Miskowski, Jennifer A.; Howard, David R.; Abler, Michael L.; Grunwald, Sandra K.

    2007-01-01

    Over the past 10 years, there has been a technical revolution in the life sciences leading to the emergence of a new discipline called bioinformatics. In response, bioinformatics-related topics have been incorporated into various undergraduate courses along with the development of new courses solely focused on bioinformatics. This report describes…

  2. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  3. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  4. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  5. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  6. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  7. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  8. Bioinformatics core competencies for undergraduate life sciences education.

    PubMed

    Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.

  9. Bioinformatics core competencies for undergraduate life sciences education

    PubMed Central

    Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542

  10. The growing need for microservices in bioinformatics.

    PubMed

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting.

  11. The growing need for microservices in bioinformatics

    PubMed Central

    Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.

    2016-01-01

    Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting. PMID:27994937

  12. ETE: a python Environment for Tree Exploration.

    PubMed

    Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

    2010-01-13

    Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  13. ETE: a python Environment for Tree Exploration

    PubMed Central

    2010-01-01

    Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885

  14. Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource

    ERIC Educational Resources Information Center

    Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc

    2005-01-01

    EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…

  15. The 2017 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest. PMID:29118973

  16. The 2017 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year's theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.

  17. Rising Strengths Hong Kong SAR in Bioinformatics.

    PubMed

    Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy

    2017-06-01

    Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.

  18. AncestrySNPminer: A bioinformatics tool to retrieve and develop ancestry informative SNP panels

    PubMed Central

    Amirisetty, Sushil; Khurana Hershey, Gurjit K.; Baye, Tesfaye M.

    2012-01-01

    A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple “scripting at the click of a button” functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php. PMID:22584067

  19. The impact of next-generation sequencing on genomics

    PubMed Central

    Zhang, Jun; Chiodini, Rod; Badr, Ahmed; Zhang, Genfa

    2011-01-01

    This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also presents a significant challenge for data storage, analyses, and management solutions. Advanced bioinformatic tools are essential for the successful application of NGS technology. As evidenced throughout this review, NGS technologies will have a striking impact on genomic research and the entire biological field. With its ability to tackle the unsolved challenges unconquered by previous genomic technologies, NGS is likely to unravel the complexity of the human genome in terms of genetic variations, some of which may be confined to susceptible loci for some common human conditions. The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come. PMID:21477781

  20. Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)

    PubMed Central

    Helwak, Aleksandra; Tollervey, David

    2014-01-01

    RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361

  1. Global computing for bioinformatics.

    PubMed

    Loewe, Laurence

    2002-12-01

    Global computing, the collaboration of idle PCs via the Internet in a SETI@home style, emerges as a new way of massive parallel multiprocessing with potentially enormous CPU power. Its relations to the broader, fast-moving field of Grid computing are discussed without attempting a review of the latter. This review (i) includes a short table of milestones in global computing history, (ii) lists opportunities global computing offers for bioinformatics, (iii) describes the structure of problems well suited for such an approach, (iv) analyses the anatomy of successful projects and (v) points to existing software frameworks. Finally, an evaluation of the various costs shows that global computing indeed has merit, if the problem to be solved is already coded appropriately and a suitable global computing framework can be found. Then, either significant amounts of computing power can be recruited from the general public, or--if employed in an enterprise-wide Intranet for security reasons--idle desktop PCs can substitute for an expensive dedicated cluster.

  2. Analysis of Molecular Cytogenetic Alteration in Rhabdomyosarcoma by Array Comparative Genomic Hybridization

    PubMed Central

    Liu, Chunxia; Li, Dongliang; Jiang, Jinfang; Hu, Jianming; Zhang, Wei; Chen, Yunzhao; Cui, Xiaobin; Qi, Yan; Zou, Hong; Zhang, WenJie; Li, Feng

    2014-01-01

    Rhabdomyosarcoma (RMS) is the most common pediatric soft tissue sarcoma with poor prognosis. The genetic etiology of RMS remains largely unclear underlying its development and progression. To reveal novel genes more precisely and new therapeutic targets associated with RMS, we used high-resolution array comparative genomic hybridization (aCGH) to explore tumor-associated copy number variations (CNVs) and genes in RMS. We confirmed several important genes by quantitative real-time polymerase chain reaction (QRT-PCR). We then performed bioinformatics-based functional enrichment analysis for genes located in the genomic regions with CNVs. In addition, we identified miRNAs located in the corresponding amplification and deletion regions and performed miRNA functional enrichment analysis. aCGH analyses revealed that all RMS showed specific gains and losses. The amplification regions were 12q13.12, 12q13.3, and 12q13.3–q14.1. The deletion regions were 1p21.1, 2q14.1, 5q13.2, 9p12, and 9q12. The recurrent regions with gains were 12q13.3, 12q13.3–q14.1, 12q14.1, and 17q25.1. The recurrent regions with losses were 9p12–p11.2, 10q11.21–q11.22, 14q32.33, 16p11.2, and 22q11.1. The mean mRNA level of GLI1 in RMS was 6.61-fold higher than that in controls (p = 0.0477) by QRT-PCR. Meanwhile, the mean mRNA level of GEFT in RMS samples was 3.92-fold higher than that in controls (p = 0.0354). Bioinformatic analysis showed that genes were enriched in functions such as immunoglobulin domain, induction of apoptosis, and defensin. Proto-oncogene functions were involved in alveolar RMS. miRNAs that located in the amplified regions in RMS tend to be enriched in oncogenic activity (miR-24 and miR-27a). In conclusion, this study identified a number of CNVs in RMS and functional analyses showed enrichment for genes and miRNAs located in these CNVs regions. These findings may potentially help the identification of novel biomarkers and/or drug targets implicated in diagnosis of and targeted therapy for RMS. PMID:24743780

  3. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

    PubMed

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

    2014-10-15

    Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Evolving from bioinformatics in-the-small to bioinformatics in-the-large.

    PubMed

    Parker, D Stott; Gorlick, Michael M; Lee, Christopher J

    2003-01-01

    We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.

  5. Bioinformatics education dissemination with an evolutionary problem solving perspective.

    PubMed

    Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J

    2010-11-01

    Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.

  6. PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2.

    PubMed

    Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

    2015-01-01

    Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users.

  7. PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2

    PubMed Central

    Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

    2015-01-01

    Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. Availability PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. PMID:26339154

  8. Post-genomic insights into the plant polysaccharide degradation potential of Aspergillus nidulans and comparison to Aspergillus niger and Aspergillus oryzae.

    PubMed

    Coutinho, Pedro M; Andersen, Mikael R; Kolenova, Katarina; vanKuyk, Patricia A; Benoit, Isabelle; Gruben, Birgit S; Trejo-Aguilar, Blanca; Visser, Hans; van Solingen, Piet; Pakula, Tiina; Seiboth, Bernard; Battaglia, Evy; Aguilar-Osorio, Guillermo; de Jong, Jan F; Ohm, Robin A; Aguilar, Mariana; Henrissat, Bernard; Nielsen, Jens; Stålbrand, Henrik; de Vries, Ronald P

    2009-03-01

    The plant polysaccharide degradative potential of Aspergillus nidulans was analysed in detail and compared to that of Aspergillus niger and Aspergillus oryzae using a combination of bioinformatics, physiology and transcriptomics. Manual verification indicated that 28.4% of the A. nidulans ORFs analysed in this study do not contain a secretion signal, of which 40% may be secreted through a non-classical method.While significant differences were found between the species in the numbers of ORFs assigned to the relevant CAZy families, no significant difference was observed in growth on polysaccharides. Growth differences were observed between the Aspergilli and Podospora anserina, which has a more different genomic potential for polysaccharide degradation, suggesting that large genomic differences are required to cause growth differences on polysaccharides. Differences were also detected between the Aspergilli in the presence of putative regulatory sequences in the promoters of the ORFs of this study and correlation of the presence of putative XlnR binding sites to induction by xylose was detected for A. niger. These data demonstrate differences at genome content, substrate specificity of the enzymes and gene regulation in these three Aspergilli, which likely reflect their individual adaptation to their natural biotope.

  9. Privacy-preserving microbiome analysis using secure computation.

    PubMed

    Wagner, Justin; Paulson, Joseph N; Wang, Xiao; Bhattacharjee, Bobby; Corrada Bravo, Héctor

    2016-06-15

    Developing targeted therapeutics and identifying biomarkers relies on large amounts of research participant data. Beyond human DNA, scientists now investigate the DNA of micro-organisms inhabiting the human body. Recent work shows that an individual's collection of microbial DNA consistently identifies that person and could be used to link a real-world identity to a sensitive attribute in a research dataset. Unfortunately, the current suite of DNA-specific privacy-preserving analysis tools does not meet the requirements for microbiome sequencing studies. To address privacy concerns around microbiome sequencing, we implement metagenomic analyses using secure computation. Our implementation allows comparative analysis over combined data without revealing the feature counts for any individual sample. We focus on three analyses and perform an evaluation on datasets currently used by the microbiome research community. We use our implementation to simulate sharing data between four policy-domains. Additionally, we describe an application of our implementation for patients to combine data that allows drug developers to query against and compensate patients for the analysis. The software is freely available for download at: http://cbcb.umd.edu/∼hcorrada/projects/secureseq.html Supplementary data are available at Bioinformatics online. hcorrada@umiacs.umd.edu. © The Author 2016. Published by Oxford University Press.

  10. JEnsembl: a version-aware Java API to Ensembl data systems.

    PubMed

    Paterson, Trevor; Law, Andy

    2012-11-01

    The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).

  11. Identification of proteins likely to be involved in morphogenesis, cell division, and signal transduction in Planctomycetes by comparative genomics.

    PubMed

    Jogler, Christian; Waldmann, Jost; Huang, Xiaoluo; Jogler, Mareike; Glöckner, Frank Oliver; Mascher, Thorsten; Kolter, Roberto

    2012-12-01

    Members of the Planctomycetes clade share many unusual features for bacteria. Their cytoplasm contains membrane-bound compartments, they lack peptidoglycan and FtsZ, they divide by polar budding, and they are capable of endocytosis. Planctomycete genomes have remained enigmatic, generally being quite large (up to 9 Mb), and on average, 55% of their predicted proteins are of unknown function. Importantly, proteins related to the unusual traits of Planctomycetes remain largely unknown. Thus, we embarked on bioinformatic analyses of these genomes in an effort to predict proteins that are likely to be involved in compartmentalization, cell division, and signal transduction. We used three complementary strategies. First, we defined the Planctomycetes core genome and subtracted genes of well-studied model organisms. Second, we analyzed the gene content and synteny of morphogenesis and cell division genes and combined both methods using a "guilt-by-association" approach. Third, we identified signal transduction systems as well as sigma factors. These analyses provide a manageable list of candidate genes for future genetic studies and provide evidence for complex signaling in the Planctomycetes akin to that observed for bacteria with complex life-styles, such as Myxococcus xanthus.

  12. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  13. Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing

    PubMed Central

    Bohmann, Kristine; Monadjem, Ara; Lehmkuhl Noer, Christina; Rasmussen, Morten; Zeale, Matt R. K.; Clare, Elizabeth; Jones, Gareth; Willerslev, Eske; Gilbert, M. Thomas P.

    2011-01-01

    Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI sequences in the GenBank and BOLD databases, similarity to existing collections allowed the preliminary identification of 25 prey families from six orders of insects within the diet of C. pumilus, and 24 families from seven orders within the diet of M. condylurus. Insects identified to families within the orders Lepidoptera and Diptera were widely present among the faecal samples analysed. The two families that were observed most frequently were Noctuidae and Nymphalidae (Lepidoptera). Species-level analysis of the data was accomplished using novel bioinformatics techniques for the identification of molecular operational taxonomic units (MOTU). Based on these analyses, our data provide little evidence of resource partitioning between sympatric M. condylurus and C. pumilus in the Simunye region of Swaziland at the time of year when the samples were collected, although as more complete databases against which to compare the sequences are generated this may have to be re-evaluated. PMID:21731749

  14. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  15. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  16. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

    PubMed

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

    2016-01-01

    Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.

  17. Genomic and functional characterisation of two Enterococcus strains isolated from Cotija cheese and their potential role in ripening.

    PubMed

    Olvera-García, Myrna; Sanchez-Flores, Alejandro; Quirasco Baruch, Maricarmen

    2018-03-01

    Enterococcus spp. are present in the native microbiota of many traditional fermented foods. Their ability to produce antibacterial compounds, mainly against Listeria monocytogenes, has raised interest recently. However, there is scarce information about their proteolytic and lipolytic potential, and their biotechnological application is currently limited because enterococcal strains have been related to nosocomial infections. In this work, next-generation sequencing and optimised bioinformatic pipelines were used to annotate the genomes of two Enterococcus strains-one E. faecium and one E. faecalis-isolated from the Mexican artisanal ripened Cotija cheese. A battery of genes involved in their proteolytic system was annotated. Genes coding for lipases, esterases and other enzymes whose final products contribute to cheese aroma and flavour were identified as well. As for the production of antibacterial compounds, several peptidoglycan hydrolase- and bacteriocin-coding genes were identified in both genomes experimentally and by bioinformatic analyses. E. faecalis showed resistance to aminoglycosides and E. faecium to aminoglycosides and macrolides, as predicted by the genome functional annotation. No pathogenicity islands were found in any of the strains, although traits such as the ability of biofilm formation and cell aggregation were observed. Finally, a comparative genomic analysis was able to discriminate between the food strains isolated and nosocomial strains. In summary, pathogenic strains are resistant to a wide range of antibiotics and contain virulence factors that cause host damage; in contrast, food strains display less antibiotic resistance, include genes that encode class II bacteriocins and express virulence factors associated with host colonisation rather than invasion.

  18. Advances in genome-wide RNAi cellular screens: a case study using the Drosophila JAK/STAT pathway

    PubMed Central

    2012-01-01

    Background Genome-scale RNA-interference (RNAi) screens are becoming ever more common gene discovery tools. However, whilst every screen identifies interacting genes, less attention has been given to how factors such as library design and post-screening bioinformatics may be effecting the data generated. Results Here we present a new genome-wide RNAi screen of the Drosophila JAK/STAT signalling pathway undertaken in the Sheffield RNAi Screening Facility (SRSF). This screen was carried out using a second-generation, computationally optimised dsRNA library and analysed using current methods and bioinformatic tools. To examine advances in RNAi screening technology, we compare this screen to a biologically very similar screen undertaken in 2005 with a first-generation library. Both screens used the same cell line, reporters and experimental design, with the SRSF screen identifying 42 putative regulators of JAK/STAT signalling, 22 of which verified in a secondary screen and 16 verified with an independent probe design. Following reanalysis of the original screen data, comparisons of the two gene lists allows us to make estimates of false discovery rates in the SRSF data and to conduct an assessment of off-target effects (OTEs) associated with both libraries. We discuss the differences and similarities between the resulting data sets and examine the relative improvements in gene discovery protocols. Conclusions Our work represents one of the first direct comparisons between first- and second-generation libraries and shows that modern library designs together with methodological advances have had a significant influence on genome-scale RNAi screens. PMID:23006893

  19. Behavioral genomics of honeybee foraging and nest defense

    NASA Astrophysics Data System (ADS)

    Hunt, Greg J.; Amdam, Gro V.; Schlipalius, David; Emore, Christine; Sardesai, Nagesh; Williams, Christie E.; Rueppell, Olav; Guzmán-Novoa, Ernesto; Arechavaleta-Velasco, Miguel; Chandra, Sathees; Fondrk, M. Kim; Beye, Martin; Page, Robert E.

    2007-04-01

    The honeybee has been the most important insect species for study of social behavior. The recently released draft genomic sequence for the bee will accelerate honeybee behavioral genetics. Although we lack sufficient tools to manipulate this genome easily, quantitative trait loci (QTLs) that influence natural variation in behavior have been identified and tested for their effects on correlated behavioral traits. We review what is known about the genetics and physiology of two behavioral traits in honeybees, foraging specialization (pollen versus nectar), and defensive behavior, and present evidence that map-based cloning of genes is more feasible in the bee than in other metazoans. We also present bioinformatic analyses of candidate genes within QTL confidence intervals (CIs). The high recombination rate of the bee made it possible to narrow the search to regions containing only 17-61 predicted peptides for each QTL, although CIs covered large genetic distances. Knowledge of correlated behavioral traits, comparative bioinformatics, and expression assays facilitated evaluation of candidate genes. An overrepresentation of genes involved in ovarian development and insulin-like signaling components within pollen foraging QTL regions suggests that an ancestral reproductive gene network was co-opted during the evolution of foraging specialization. The major QTL influencing defensive/aggressive behavior contains orthologs of genes involved in central nervous system activity and neurogenesis. Candidates at the other two defensive-behavior QTLs include modulators of sensory signaling ( Am5HT 7 serotonin receptor, AmArr4 arrestin, and GABA-B-R1 receptor). These studies are the first step in linking natural variation in honeybee social behavior to the identification of underlying genes.

  20. Proteomic profiling of early degenerative retina of RCS rats

    PubMed Central

    Zhu, Zhi-Hong; Fu, Yan; Weng, Chuan-Huang; Zhao, Cong-Jian; Yin, Zheng-Qin

    2017-01-01

    AIM To identify the underlying cellular and molecular changes in retinitis pigmentosa (RP). METHODS Label-free quantification-based proteomics analysis, with its advantages of being more economic and consisting of simpler procedures, has been used with increasing frequency in modern biological research. Dystrophic RCS rats, the first laboratory animal model for the study of RP, possess a similar pathological course as human beings with the diseases. Thus, we employed a comparative proteomics analysis approach for in-depth proteome profiling of retinas from dystrophic RCS rats and non-dystrophic congenic controls through Linear Trap Quadrupole - orbitrap MS/MS, to identify the significant differentially expressed proteins (DEPs). Bioinformatics analyses, including Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation and upstream regulatory analysis, were then performed on these retina proteins. Finally, a Western blotting experiment was carried out to verify the difference in the abundance of transcript factor E2F1. RESULTS In this study, we identified a total of 2375 protein groups from the retinal protein samples of RCS rats and non-dystrophic congenic controls. Four hundred thirty-four significantly DEPs were selected by Student's t-test. Based on the results of the bioinformatics analysis, we identified mitochondrial dysfunction and transcription factor E2F1 as the key initiation factors in early retinal degenerative process. CONCLUSION We showed that the mitochondrial dysfunction and the transcription factor E2F1 substantially contribute to the disease etiology of RP. The results provide a new potential therapeutic approach for this retinal degenerative disease. PMID:28730077

  1. Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

    PubMed Central

    Wood, Louisa; Gebhardt, Philipp

    2013-01-01

    Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266

  2. The 2016 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.

  3. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  4. Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics

    PubMed Central

    Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D

    2008-01-01

    Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345

  5. Comparative BioInformatics and Computational Toxicology

    EPA Science Inventory

    Reflecting the numerous changes in the field since the publication of the previous edition, this third edition of Developmental Toxicology focuses on the mechanisms of developmental toxicity and incorporates current technologies for testing in the risk assessment process.

  6. Imaging mass spectrometry and genome mining reveal highly antifungal virulence factor of mushroom soft rot pathogen.

    PubMed

    Graupner, Katharina; Scherlach, Kirstin; Bretschneider, Tom; Lackner, Gerald; Roth, Martin; Gross, Harald; Hertweck, Christian

    2012-12-21

    Caught in the act: imaging mass spectrometry of a button mushroom infected with the soft rot pathogen Janthinobacterium agaricidamnosum in conjunction with genome mining revealed jagaricin as a highly antifungal virulence factor that is not produced under standard cultivation conditions. The structure of jagaricin was rigorously elucidated by a combination of physicochemical analyses, chemical derivatization, and bioinformatics. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Neutrophilic Iron-Oxidizing Zetaproteobacteria and Mild Steel Corrosion in Nearshore Marine Environments

    DTIC Science & Technology

    2011-02-16

    were checked for the presence of heterotrophic bacteria by streak- ing a sample on ASW-R2A agar plates. DNA extraction and analysis of phylogenetic ...Bellerophon v. 3 (greengenes.lbl.gov) and Pintail (www.bioinformatics -toolkit.org/Web-Pintail/). Phylogenetic trees were constructed for SSU rRNA gene...CLUSTALW (44), and phylogenetic analyses were conducted in MEGA4 (42). The evolutionary history was inferred using the neighbor-joining method (39), and

  8. Occurrence of lignin degradation genotypes and phenotypes among prokaryotes.

    PubMed

    Tian, Jiang-Hao; Pourcher, Anne-Marie; Bouchez, Théodore; Gelhaye, Eric; Peu, Pascal

    2014-12-01

    A number of prokaryotes actively contribute to lignin degradation in nature and their activity could be of interest for many applications including the production of biogas/biofuel from lignocellulosic biomass and biopulping. This review compares the reliability and efficiency of the culture-dependent screening methods currently used for the isolation of ligninolytic prokaryotes. Isolated prokaryotes exhibiting lignin-degrading potential are presented according to their phylogenetic groups. With the development of bioinformatics, culture-independent techniques are emerging that allow larger-scale data mining for ligninolytic prokaryotic functions but today, these techniques still have some limits. In this work, two phylogenetic affiliations of isolated prokaryotes exhibiting ligninolytic potential and laccase-encoding prokaryotes were determined on the basis of 16S rDNA sequences, providing a comparative view of results obtained by the two types of screening techniques. The combination of laboratory culture and bioinformatics approaches is a promising way to explore lignin-degrading prokaryotes.

  9. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were analyzed in relation to their experience in learning a bioinformatics mini-unit. There were distinct individual differences among the participants, especially in the way they processed information and integrated procedural and analytical thought during bioinformatics learning. These differences may provide insights into some of the specific needs of students that educators and curriculum designers should consider when designing bioinformatics learning experiences. Implications for teacher education and curriculum design are presented in addition to some suggestions for further research.

  10. UTOPIA-User-Friendly Tools for Operating Informatics Applications.

    PubMed

    Pettifer, S R; Sinnott, J R; Attwood, T K

    2004-01-01

    Bioinformaticians routinely analyse vast amounts of information held both in large remote databases and in flat data files hosted on local machines. The contemporary toolkit available for this purpose consists of an ad hoc collection of data manipulation tools, scripting languages and visualization systems; these must often be combined in complex and bespoke ways, the result frequently being an unwieldy artefact capable of one specific task, which cannot easily be exploited or extended by other practitioners. Owing to the sizes of current databases and the scale of the analyses necessary, routine bioinformatics tasks are often automated, but many still require the unique experience and intuition of human researchers: this requires tools that support real-time interaction with complex datasets. Many existing tools have poor user interfaces and limited real-time performance when applied to realistically large datasets; much of the user's cognitive capacity is therefore focused on controlling the tool rather than on performing the research. The UTOPIA project is addressing some of these issues by building reusable software components that can be combined to make useful applications in the field of bioinformatics. Expertise in the fields of human computer interaction, high-performance rendering, and distributed systems is being guided by bioinformaticians and end-user biologists to create a toolkit that is both architecturally sound from a computing point of view, and directly addresses end-user and application-developer requirements.

  11. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    NASA Astrophysics Data System (ADS)

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  12. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    PubMed Central

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-01-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404

  13. Microbial bioinformatics 2020.

    PubMed

    Pallen, Mark J

    2016-09-01

    Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! © 2016 The Author. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  14. Systems biology of cancer biomarker detection.

    PubMed

    Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas

    2013-01-01

    Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.

  15. Multiple instances of paraphyletic species and cryptic taxa revealed by mitochondrial and nuclear RAD data for Calandrella larks (Aves: Alaudidae).

    PubMed

    Stervander, Martin; Alström, Per; Olsson, Urban; Ottosson, Ulf; Hansson, Bengt; Bensch, Staffan

    2016-09-01

    The avian genus Calandrella (larks) was recently suggested to be non-monophyletic, and was divided into two genera, of which Calandrella sensu stricto comprises 4-5 species in Eurasia and Africa. We analysed mitochondrial cytochrome b (cytb) and nuclear Restriction-site Associated DNA (RAD) sequences from all species, and for cytb we studied 21 of the 22 recognised subspecies, with the aim to clarify the phylogenetic relationships within the genus and to compare large-scale nuclear sequence patterns with a widely used mitochondrial marker. Cytb indicated deep splits among the currently recognised species, although it failed to support the interrelationships among most of these. It also revealed unexpected deep divergences within C. brachydactyla, C. blanfordi/C. erlangeri, C. cinerea, and C. acutirostris. It also suggested that both C. brachydactyla and C. blanfordi, as presently circumscribed, are paraphyletic. In contrast, most of the many subspecies of C. brachydactyla and C. cinerea were unsupported by cytb, although two populations of C. cinerea were found to be genetically distinct. The RAD data corroborated the cytb tree (for the smaller number of taxa analysed) and recovered strongly supported interspecific relationships. However, coalescence analyses of the RAD data, analysed in SNAPP both with and without an outgroup, received equally strong support for two conflicting topologies. We suggest that the tree rooted with an outgroup - which is not recommended for SNAPP - is more trustworthy, and suggest that the reliability of analyses performed without any outgroup species should be thoroughly evaluated. We also demonstrate that degraded museum samples can be phylogenetically informative in RAD analyses following careful bioinformatic treatment. We note that the genus Calandrella is in need of taxonomic revision. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. PoPLAR: Portal for Petascale Lifescience Applications and Research

    PubMed Central

    2013-01-01

    Background We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. Methods The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. Results This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. Conclusions This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. PMID:23902523

  17. SBION: A Program for Analyses of Salt-Bridges from Multiple Structure Files.

    PubMed

    Gupta, Parth Sarthi Sen; Mondal, Sudipta; Mondal, Buddhadev; Islam, Rifat Nawaz Ul; Banerjee, Shyamashree; Bandyopadhyay, Amal K

    2014-01-01

    Salt-bridge and network salt-bridge are specific electrostatic interactions that contribute to the overall stability of proteins. In hierarchical protein folding model, these interactions play crucial role in nucleation process. The advent and growth of protein structure database and its availability in public domain made an urgent need for context dependent rapid analysis of salt-bridges. While these analyses on single protein is cumbersome and time-consuming, batch analyses need efficient software for rapid topological scan of a large number of protein for extracting details on (i) fraction of salt-bridge residues (acidic and basic). (ii) Chain specific intra-molecular salt-bridges, (iii) inter-molecular salt-bridges (protein-protein interactions) in all possible binary combinations (iv) network salt-bridges and (v) secondary structure distribution of salt-bridge residues. To the best of our knowledge, such efficient software is not available in public domain. At this juncture, we have developed a program i.e. SBION which can perform all the above mentioned computations for any number of protein with any number of chain at any given distance of ion-pair. It is highly efficient, fast, error-free and user friendly. Finally we would say that our SBION indeed possesses potential for applications in the field of structural and comparative bioinformatics studies. SBION is freely available for non-commercial/academic institutions on formal request to the corresponding author (akbanerjee@biotech.buruniv.ac.in).

  18. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  19. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  20. A bioinformatics potpourri.

    PubMed

    Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba

    2018-01-19

    The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.

  1. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

  2. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface.

    PubMed

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-08-25

    Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.

  3. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.

    PubMed

    Naccache, Samia N; Federman, Scot; Veeraraghavan, Narayanan; Zaharia, Matei; Lee, Deanna; Samayoa, Erik; Bouquet, Jerome; Greninger, Alexander L; Luk, Ka-Cheung; Enge, Barryett; Wadford, Debra A; Messenger, Sharon L; Genrich, Gillian L; Pellegrino, Kristen; Grard, Gilda; Leroy, Eric; Schneider, Bradley S; Fair, Joseph N; Martínez, Miguel A; Isa, Pavel; Crump, John A; DeRisi, Joseph L; Sittler, Taylor; Hackett, John; Miller, Steve; Chiu, Charles Y

    2014-07-01

    Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times. © 2014 Naccache et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Phosphoproteomics and Bioinformatics Analyses of Spinal Cord Proteins in Rats with Morphine Tolerance

    PubMed Central

    Liaw, Wen-Jinn; Tsao, Cheng-Ming; Huang, Go-Shine; Wu, Chin-Chen; Ho, Shung-Tai; Wang, Jhi-Joung; Tao, Yuan-Xiang; Shui, Hao-Ai

    2014-01-01

    Introduction Morphine is the most effective pain-relieving drug, but it can cause unwanted side effects. Direct neuraxial administration of morphine to spinal cord not only can provide effective, reliable pain relief but also can prevent the development of supraspinal side effects. However, repeated neuraxial administration of morphine may still lead to morphine tolerance. Methods To better understand the mechanism that causes morphine tolerance, we induced tolerance in rats at the spinal cord level by giving them twice-daily injections of morphine (20 µg/10 µL) for 4 days. We confirmed tolerance by measuring paw withdrawal latencies and maximal possible analgesic effect of morphine on day 5. We then carried out phosphoproteomic analysis to investigate the global phosphorylation of spinal proteins associated with morphine tolerance. Finally, pull-down assays were used to identify phosphorylated types and sites of 14-3-3 proteins, and bioinformatics was applied to predict biological networks impacted by the morphine-regulated proteins. Results Our proteomics data showed that repeated morphine treatment altered phosphorylation of 10 proteins in the spinal cord. Pull-down assays identified 2 serine/threonine phosphorylated sites in 14-3-3 proteins. Bioinformatics further revealed that morphine impacted on cytoskeletal reorganization, neuroplasticity, protein folding and modulation, signal transduction and biomolecular metabolism. Conclusions Repeated morphine administration may affect multiple biological networks by altering protein phosphorylation. These data may provide insight into the mechanism that underlies the development of morphine tolerance. PMID:24392096

  5. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

    PubMed Central

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-01-01

    Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

  6. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

    PubMed Central

    Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552

  7. Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data

    PubMed Central

    Wood, David L. A.; Nones, Katia; Steptoe, Anita; Christ, Angelika; Harliwong, Ivon; Newell, Felicity; Bruxner, Timothy J. C.; Miller, David; Cloonan, Nicole; Grimmond, Sean M.

    2015-01-01

    Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci. PMID:25965996

  8. Big Data Bioinformatics

    PubMed Central

    GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO

    2017-01-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:27908398

  9. Big Data Bioinformatics

    PubMed Central

    GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO

    2017-01-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:24799088

  10. The Ancient Evolutionary History of Polyomaviruses

    PubMed Central

    Buck, Christopher B.; Van Doorslaer, Koenraad; Peretti, Alberto; Geoghegan, Eileen M.; Tisza, Michael J.; An, Ping; Katz, Joshua P.; Pipas, James M.; McBride, Alison A.; Camus, Alvin C.; McDermott, Alexa J.; Dill, Jennifer A.; Delwart, Eric; Ng, Terry F. F.; Farkas, Kata; Austin, Charlotte; Kraberger, Simona; Davison, William; Pastrana, Diana V.; Varsani, Arvind

    2016-01-01

    Polyomaviruses are a family of DNA tumor viruses that are known to infect mammals and birds. To investigate the deeper evolutionary history of the family, we used a combination of viral metagenomics, bioinformatics, and structural modeling approaches to identify and characterize polyomavirus sequences associated with fish and arthropods. Analyses drawing upon the divergent new sequences indicate that polyomaviruses have been gradually co-evolving with their animal hosts for at least half a billion years. Phylogenetic analyses of individual polyomavirus genes suggest that some modern polyomavirus species arose after ancient recombination events involving distantly related polyomavirus lineages. The improved evolutionary model provides a useful platform for developing a more accurate taxonomic classification system for the viral family Polyomaviridae. PMID:27093155

  11. Big data bioinformatics.

    PubMed

    Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao

    2014-12-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. © 2014 Wiley Periodicals, Inc.

  12. [Bioinformatics analysis of mosquito densovirus nostructure protein NS1].

    PubMed

    Dong, Yun-qiao; Ma, Wen-li; Gu, Jin-bao; Zheng, Wen-ling

    2009-12-01

    To analyze and predict the structure and function of mosquito densovirus (MDV) nostructual protein1 (NS1). Using different bioinformatics software, the EXPASY pmtparam tool, ClustalX1.83, Bioedit, MEGA3.1, ScanProsite, and Motifscan, respectively to comparatively analyze and predict the physic-chemical parameters, homology, evolutionary relation, secondary structure and main functional motifs of NS1. MDV NS1 protein was a unstable hydrophilic protein and the amino acid sequence was highly conserved which had a relatively closer evolutionary distance with infectious hypodermal and hematopoietic necrosis virus (IHHNV). MDV NS1 has a specific domain of superfamily 3 helicase of small DNA viruses. This domain contains the NTP-binding region with a metal ion-dependent ATPase activity. A virus replication roller rolling-circle replication(RCR) initiation domain was found near the N terminal of this protein. This protien has the biological function of single stranded incision enzyme. The bioinformatics prediction results suggest that MDV NS1 protein plays a key role in viral replication, packaging, and the other stages of viral life.

  13. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  14. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  15. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  16. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  17. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs

    PubMed Central

    2012-01-01

    Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China. PMID:23281929

  18. OpenHelix: bioinformatics education outside of a different box.

    PubMed

    Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C

    2010-11-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.

  19. OpenHelix: bioinformatics education outside of a different box

    PubMed Central

    Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.

    2010-01-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  20. Translational bioinformatics: linking the molecular world to the clinical world.

    PubMed

    Altman, R B

    2012-06-01

    Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.

  1. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    PubMed

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.

  2. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes

    PubMed Central

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter.’ We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about the presence and prevalence of giant viruses in the environment and the human body. PMID:27065984

  3. CompGO: an R package for comparing and visualizing Gene Ontology enrichment differences between DNA binding experiments.

    PubMed

    Waardenberg, Ashley J; Basset, Samuel D; Bouveret, Romaric; Harvey, Richard P

    2015-09-02

    Gene ontology (GO) enrichment is commonly used for inferring biological meaning from systems biology experiments. However, determining differential GO and pathway enrichment between DNA-binding experiments or using the GO structure to classify experiments has received little attention. Herein, we present a bioinformatics tool, CompGO, for identifying Differentially Enriched Gene Ontologies, called DiEGOs, and pathways, through the use of a z-score derivation of log odds ratios, and visualizing these differences at GO and pathway level. Through public experimental data focused on the cardiac transcription factor NKX2-5, we illustrate the problems associated with comparing GO enrichments between experiments using a simple overlap approach. We have developed an R/Bioconductor package, CompGO, which implements a new statistic normally used in epidemiological studies for performing comparative GO analyses and visualizing comparisons from . BED data containing genomic coordinates as well as gene lists as inputs. We justify the statistic through inclusion of experimental data and compare to the commonly used overlap method. CompGO is freely available as a R/Bioconductor package enabling easy integration into existing pipelines and is available at: http://www.bioconductor.org/packages/release/bioc/html/CompGO.html packages/release/bioc/html/CompGO.html.

  4. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological conclusions. Our results demonstrate that while there were differences in depth of coverage and phylogenetic diversity, all workflows revealed comparable treatment effects on microbial diversity. To increase reproducibility and reliability and to retain consistency between similar studies, it is important to consider the impact on data quality and relative abundance of taxa when selecting NGS platforms and analysis tools for microbiome studies.

  5. Deep learning in bioinformatics.

    PubMed

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2017-09-01

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  7. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    PubMed

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

  8. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  9. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  10. Expanding roles in a library-based bioinformatics service program: a case study

    PubMed Central

    Li, Meng; Chen, Yi-Bu; Clintworth, William A

    2013-01-01

    Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting: A program at a health sciences library serving a large academic medical center with a strong research focus is described. Methods: The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. Results: Personnel and software have increased along with the number of registered software users and use of the provided services. Conclusion: With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs. PMID:24163602

  11. 4273π: Bioinformatics education on low cost ARM hardware

    PubMed Central

    2013-01-01

    Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194

  12. 4273π: bioinformatics education on low cost ARM hardware.

    PubMed

    Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

    2013-08-12

    Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

  13. A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.

    PubMed

    Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis

    2012-07-01

    The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.

  14. RNA-Seq Analysis to Measure the Expression of SINE Retroelements.

    PubMed

    Román, Ángel Carlos; Morales-Hernández, Antonio; Fernández-Salguero, Pedro M

    2016-01-01

    The intrinsic features of retroelements, like their repetitive nature and disseminated presence in their host genomes, demand the use of advanced methodologies for their bioinformatic and functional study. The short length of SINE (short interspersed elements) retrotransposons makes such analyses even more complex. Next-generation sequencing (NGS) technologies are currently one of the most widely used tools to characterize the whole repertoire of gene expression in a specific tissue. In this chapter, we will review the molecular and computational methods needed to perform NGS analyses on SINE elements. We will also describe new methods of potential interest for researchers studying repetitive elements. We intend to outline the general ideas behind the computational analyses of NGS data obtained from SINE elements, and to stimulate other scientists to expand our current knowledge on SINE biology using RNA-seq and other NGS tools.

  15. Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

    PubMed Central

    Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769

  16. The 2015 Bioinformatics Open Source Conference (BOSC 2015)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J. A.; Lapp, Hilmar

    2016-01-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included “Data Science;” “Standards and Interoperability;” “Open Science and Reproducibility;” “Translational Bioinformatics;” “Visualization;” and “Bioinformatics Open Source Project Updates”. In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled “Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community,” that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653

  17. Making authentic science accessible-the benefits and challenges of integrating bioinformatics into a high-school science curriculum.

    PubMed

    Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.

  18. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

    PubMed

    Oulas, Anastasis; Minadakis, George; Zachariou, Margarita; Sokratous, Kleitos; Bourdakou, Marilena M; Spyrou, George M

    2017-11-27

    Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine. © The Author 2017. Published by Oxford University Press.

  19. Identification of potential target genes of ROR-alpha in THP1 and HUVEC cell lines.

    PubMed

    Gulec, Cagri; Coban, Neslihan; Ozsait-Selcuk, Bilge; Sirma-Ekmekci, Sema; Yildirim, Ozlem; Erginel-Unaltuna, Nihan

    2017-04-01

    ROR-alpha is a nuclear receptor, activity of which can be modulated by natural or synthetic ligands. Due to its possible involvement in, and potential therapeutic target for atherosclerosis, we aimed to identify ROR-alpha target genes in monocytic and endothelial cell lines. We performed chromatin immunoprecipitation (ChIP) followed by tiling array (ChIP-on-chip) for ROR-alpha in monocytic cell line THP1 and endothelial cell line HUVEC. Following bioinformatic analysis of the array data, we tested four candidate genes in terms of dependence of their expression level on ligand-mediated ROR-alpha activity, and two of them in terms of promoter occupancy by ROR-alpha. Bioinformatic analyses of ChIP-on-chip data suggested that ROR-alpha binds to genomic regions near the transcription start site (TSS) of more than 3000 genes in THP1 and HUVEC. Potential ROR-alpha target genes in both cell types seem to be involved mainly in membrane receptor activity, signal transduction and ion transport. While SPP1 and IKBKA were shown to be direct target genes of ROR-alpha in THP1 monocytes, inflammation related gene HMOX1 and heat shock protein gene HSPA8 were shown to be potential target genes of ROR-alpha. Our results suggest that ROR-alpha may regulate signaling receptor activity, and transmembrane transport activity through its potential target genes. ROR-alpha seems also to play role in cellular sensitivity to environmental substances like arsenite and chloroprene. Although, the expression analyses have shown that synthetic ROR-alpha ligands can modulate some of potential ROR-alpha target genes, functional significance of ligand-dependent modulation of gene expression needs to be confirmed with further analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Selection Pressure in CD8+ T-cell Epitopes in the pol Gene of HIV-1 Infected Individuals in Colombia. A Bioinformatic Approach

    PubMed Central

    Acevedo-Sáenz, Liliana; Ochoa, Rodrigo; Rugeles, Maria Teresa; Olaya-García, Patricia; Velilla-Hernández, Paula Andrea; Diaz, Francisco J.

    2015-01-01

    One of the main characteristics of the human immunodeficiency virus is its genetic variability and rapid adaptation to changing environmental conditions. This variability, resulting from the lack of proofreading activity of the viral reverse transcriptase, generates mutations that could be fixed either by random genetic drift or by positive selection. Among the forces driving positive selection are antiretroviral therapy and CD8+ T-cells, the most important immune mechanism involved in viral control. Here, we describe mutations induced by these selective forces acting on the pol gene of HIV in a group of infected individuals. We used Maximum Likelihood analyses of the ratio of non-synonymous to synonymous mutations per site (dN/dS) to study the extent of positive selection in the protease and the reverse transcriptase, using 614 viral sequences from Colombian patients. We also performed computational approaches, docking and algorithmic analyses, to assess whether the positively selected mutations affected binding to the HLA molecules. We found 19 positively-selected codons in drug resistance-associated sites and 22 located within CD8+ T-cell epitopes. A high percentage of mutations in these epitopes has not been previously reported. According to the docking analyses only one of those mutations affected HLA binding. However, algorithmic methods predicted a decrease in the affinity for the HLA molecule in seven mutated peptides. The bioinformatics strategies described here are useful to identify putative positively selected mutations associated with immune escape but should be complemented with an experimental approach to define the impact of these mutations on the functional profile of the CD8+ T-cells. PMID:25803098

  1. Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures.

    PubMed

    Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka

    2015-11-01

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety. 2015 FRAME.

  2. Thorough analysis of unorthodox ABO deletions called by the 1000 Genomes project.

    PubMed

    Möller, M; Hellberg, Å; Olsson, M L

    2018-02-01

    ABO remains the clinically most important blood group system, but despite earlier extensive research, significant findings are still being made. The vast majority of catalogued ABO null alleles are based on the c.261delG polymorphism. Apart from c.802G>A, other mechanisms for O alleles are rare. While analysing the data set from the 1000 Genomes (1000G) project, we encountered two previously uncharacterized deletions, which needed further exploration. The Erythrogene database, complemented with bioinformatics software, was used to analyse ABO in 2504 individuals from 1000G. DNA samples from selected 1000G donors and African blood donors were examined by allele-specific PCR and Sanger sequencing to characterize predicted deletions. A 5821-bp deletion encompassing exons 5-7 was called in twenty 1000G individuals, predominantly Africans. This allele was confirmed and its exact deletion point defined by bioinformatic analyses and in vitro experiments. A PCR assay was developed, and screening of African samples revealed three donors heterozygous for this deletion, which was thereby phenotypically established as an O allele. Analysis of upstream genetic markers indicated an ancestral origin from ABO*O.01.02. We estimate this deletion as the 3rd most common mechanism behind O alleles. A 24-bp deletion was called in nine individuals and showed greater diversity regarding ethnic distribution and allelic background. It could neither be confirmed by in silico nor in vitro experiments. A previously uncharacterized ABO deletion among Africans was comprehensively mapped and a genotyping strategy devised. The false prediction of another deletion emphasizes the need for cautious interpretation of NGS data and calls for strict validation routines. © 2017 International Society of Blood Transfusion.

  3. CSB: a Python framework for structural bioinformatics.

    PubMed

    Kalev, Ivan; Mechelke, Martin; Kopec, Klaus O; Holder, Thomas; Carstens, Simeon; Habeck, Michael

    2012-11-15

    Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project's website http://csb.codeplex.com. ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de

  4. Bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles.

    PubMed

    Neuman, Benjamin W

    2016-11-01

    Replication of eukaryotic positive-stranded RNA viruses is usually linked to the presence of membrane-associated replicative organelles. The purpose of this review is to discuss the function of proteins responsible for formation of the coronavirus replicative organelle. This will be done by identifying domains that are conserved across the order Nidovirales, and by summarizing what is known about function and structure at the level of protein domains. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Human Dental Pulp Stem Cells Are More Effective Than Human Bone Marrow-Derived Mesenchymal Stem Cells in Cerebral Ischemic Injury.

    PubMed

    Song, Miyeoun; Lee, Jae-Hyung; Bae, Jinhyun; Bu, Youngmin; Kim, Eun-Cheol

    2017-06-09

    We compared the therapeutic effects and mechanism of transplanted human dental pulp stem cells (hDPSCs) and human bone marrow-derived mesenchymal stem cells (hBM-MSCs) in a rat stroke model and an in vitro model of ischemia. Rats were intravenously injected with hDPSCs or hBM-MSCs 24 h after middle cerebral artery occlusion (MCAo), and both groups showed improved functional recovery and reduced infarct volume versus control rats, but the hDPSC group showed greater reduction in infarct volume than the hBM-MSC group. The positive area for the endothelial cell marker was greater in the lesion boundary areas in the hDPSC group than in the hBM-MSC group. Administration of hDPSCs to rats with stroke significantly decreased reactive gliosis, as evidenced by the attenuation of MCAo-induced GFAP+/nestin+ and GFAP+/Musashi-1+ cells, compared with hBM-MSCs. In vivo findings were confirmed by in vitro data illustrating that hDPSCs showed superior neuroprotective, migratory, and in vitro angiogenic effects in oxygen-glucose deprivation (OGD)-injured human astrocytes (hAs) versus hBM-MSCs. Comprehensive comparative bioinformatics analyses from hDPSC- and hBM-MSC-treated in vitro OGD-injured hAs were examined by RNA sequencing technology. In gene ontology and KEGG pathway analyses, significant pathways in the hDPSC-treated group were the MAPK and TGF-β signaling pathways. Thus, hDPSCs may be a better cell therapy source for ischemic stroke than hBM-MSCs.

  6. Interdisciplinary Introductory Course in Bioinformatics

    ERIC Educational Resources Information Center

    Kortsarts, Yana; Morris, Robert W.; Utell, Janine M.

    2010-01-01

    Bioinformatics is a relatively new interdisciplinary field that integrates computer science, mathematics, biology, and information technology to manage, analyze, and understand biological, biochemical and biophysical information. We present our experience in teaching an interdisciplinary course, Introduction to Bioinformatics, which was developed…

  7. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  8. Comparative Transcriptomic Exploration Reveals Unique Molecular Adaptations of Neuropathogenic Trichobilharzia to Invade and Parasitize Its Avian Definitive Host

    PubMed Central

    Leontovyč, Roman; Young, Neil D.; Korhonen, Pasi K.; Hall, Ross S.; Tan, Patrick; Mikeš, Libor; Kašný, Martin; Horák, Petr; Gasser, Robin B.

    2016-01-01

    To date, most molecular investigations of schistosomatids have focused principally on blood flukes (schistosomes) of humans. Despite the clinical importance of cercarial dermatitis in humans caused by Trichobilharzia regenti and the serious neuropathologic disease that this parasite causes in its permissive avian hosts and accidental mammalian hosts, almost nothing is known about the molecular aspects of how this fluke invades its hosts, migrates in host tissues and how it interacts with its hosts’ immune system. Here, we explored selected aspects using a transcriptomic-bioinformatic approach. To do this, we sequenced, assembled and annotated the transcriptome representing two consecutive life stages (cercariae and schistosomula) of T. regenti involved in the first phases of infection of the avian host. We identified key biological and metabolic pathways specific to each of these two developmental stages and also undertook comparative analyses using data available for taxonomically related blood flukes of the genus Schistosoma. Detailed comparative analyses revealed the unique involvement of carbohydrate metabolism, translation and amino acid metabolism, and calcium in T. regenti cercariae during their invasion and in growth and development, as well as the roles of cell adhesion molecules, microaerobic metabolism (citrate cycle and oxidative phosphorylation), peptidases (cathepsins) and other histolytic and lysozomal proteins in schistosomula during their particular migration in neural tissues of the avian host. In conclusion, the present transcriptomic exploration provides new and significant insights into the molecular biology of T. regenti, which should underpin future genomic and proteomic investigations of T. regenti and, importantly, provides a useful starting point for a range of comparative studies of schistosomatids and other trematodes. PMID:26863542

  9. Characterization of hepatitis B virus surface antigen variability and impact on HBs antigen clearance under nucleos(t)ide analogue therapy.

    PubMed

    Velay, A; Jeulin, H; Eschlimann, M; Malvé, B; Goehringer, F; Bensenane, M; Frippiat, J-P; Abraham, P; Ismail, A M; Murray, J M; Combet, C; Zoulim, F; Bronowicki, J-P; Schvoerer, E

    2016-05-01

    For hepatitis B virus (HBV)-related chronic infection under treatment by nucleos(t)ide analogues (NUCs), HBsAg clearance is the ultimate therapeutic goal but very infrequent. We investigated how HBV envelope protein variability could lead to differential HBsAg clearance on NUCs. For 12 HBV genotype D patients receiving NUCs, six resolvers (HBsAg clearance) were compared to six matched nonresolvers (HBsAg persistence). PreS/S amino acid (aa) sequences were analysed with bioinformatics to predict HBV envelope antigenicity and aa covariance. To enrich our analyses on very rare resolvers, these were compared with other HBV genotype D strains in three characterized clinical cohorts including common chronically infected patients. The sT125M+sP127T combination was observed in four nonresolvers of six, corroborated by aa covariance analysis, associated with a lower predicted antigenicity than sT125T+sP127P. Concordant features within this HBV key functional domain, at positions 125 and 127, were reported from two of the three comparative cohorts. In our hands, a lower ELISA reactivity of HBV-vaccinated mice sera was observed against the sT125M mutant. In the S gene, 56 aa changes in minor variants were detected in non-resolvers, mainly in the major hydrophilic region, vs 28 aa changes in resolvers. Molecular features in patients showing HBsAg persistence on NUCs argue in favour of a different aa pattern in the HBV S gene compared to those showing HBsAg clearance. In nonresolvers, a decrease in HBs 'a' determinant antigenicity and more frequent mutations in the S gene suggest a role for the HBV envelope characteristics in HBsAg persistence. © 2016 John Wiley & Sons Ltd.

  10. Exosomes/tricalcium phosphate combination scaffolds can enhance bone regeneration by activating the PI3K/Akt signaling pathway.

    PubMed

    Zhang, Jieyuan; Liu, Xiaolin; Li, Haiyan; Chen, Chunyuan; Hu, Bin; Niu, Xin; Li, Qing; Zhao, Bizeng; Xie, Zongping; Wang, Yang

    2016-09-20

    Recently, accumulating evidence has shown that exosomes, the naturally secreted nanocarriers of cells, can exert therapeutic effects in various disease models in the absence of parent cells. However, application of exosomes in bone defect repair and regeneration has been rarely reported, and little is known regarding their underlying mechanisms. Exosomes derived from human-induced pluripotent stem cell-derived mesenchymal stem cells (hiPS-MSC-Exos) were combined with tricalcium phosphate (β-TCP) to repair critical-sized calvarial bone defects, and the efficacy was assessed by histological examination. We evaluated the in vitro effects of hiPSC-MSC-Exos on the proliferation, migration, and osteogenic differentiation of human bone marrow-derived mesenchymal stem cells (hBMSCs) by cell-counting, scratch assays, and qRT-PCR, respectively. Gene expression profiling and bioinformatics analyses were also used to identify the underlying mechanisms in the repair. We found that the exosome/β-TCP combination scaffolds could enhance osteogenesis as compared to pure β-TCP scaffolds. In vitro assays showed that the exosomes could release from β-TCP and could be internalized by hBMSCs. In addition, the internalization of exosomes into hBMSCs could profoundly enhance the proliferation, migration, and osteogenic differentiation of hBMSCs. Furthermore, gene expression profiling and bioinformatics analyses demonstrated that exosome/β-TCP combination scaffolds significantly altered the expression of a network of genes involved in the PI3K/Akt signaling pathway. Functional studies further confirmed that the PI3K/Akt signaling pathway was the critical mediator during the exosome-induced osteogenic responses of hBMSCs. We propose that the exosomes can enhance the osteoinductivity of β-TCP through activating the PI3K/Akt signaling pathway of hBMSCs, which means that the exosome/β-TCP combination scaffolds possess better osteogenesis activity than pure β-TCP scaffolds. These results indicate that naturally secreted nanocarriers-exosomes can be used as a bioactive material to improve the bioactivity of the biomaterials, and that hiPS-MSC-Exos combined with β-TCP scaffolds can be potentially used for repairing bone defects.

  11. A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education

    PubMed Central

    2012-01-01

    Background Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. Findings In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. Conclusions The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts. PMID:23587420

  12. Small non-coding RNA profiling in human biofluids and surrogate tissues from healthy individuals: description of the diverse and most represented species.

    PubMed

    Ferrero, Giulio; Cordero, Francesca; Tarallo, Sonia; Arigoni, Maddalena; Riccardo, Federica; Gallo, Gaetano; Ronco, Guglielmo; Allasia, Marco; Kulkarni, Neha; Matullo, Giuseppe; Vineis, Paolo; Calogero, Raffaele A; Pardini, Barbara; Naccarati, Alessio

    2018-01-09

    The role of non-coding RNAs in different biological processes and diseases is continuously expanding. Next-generation sequencing together with the parallel improvement of bioinformatics analyses allows the accurate detection and quantification of an increasing number of RNA species. With the aim of exploring new potential biomarkers for disease classification, a clear overview of the expression levels of common/unique small RNA species among different biospecimens is necessary. However, except for miRNAs in plasma, there are no substantial indications about the pattern of expression of various small RNAs in multiple specimens among healthy humans. By analysing small RNA-sequencing data from 243 samples, we have identified and compared the most abundantly and uniformly expressed miRNAs and non-miRNA species of comparable size with the library preparation in four different specimens (plasma exosomes, stool, urine, and cervical scrapes). Eleven miRNAs were commonly detected among all different specimens while 231 miRNAs were globally unique across them. Classification analysis using these miRNAs provided an accuracy of 99.6% to recognize the sample types. piRNAs and tRNAs were the most represented non-miRNA small RNAs detected in all specimen types that were analysed, particularly in urine samples. With the present data, the most uniformly expressed small RNAs in each sample type were also identified. A signature of small RNAs for each specimen could represent a reference gene set in validation studies by RT-qPCR. Overall, the data reported hereby provide an insight of the constitution of the human miRNome and of other small non-coding RNAs in various specimens of healthy individuals.

  13. Influenza research database: an integrated bioinformatics resource for influenza virus research

    USDA-ARS?s Scientific Manuscript database

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...

  14. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…

  15. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  16. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  17. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

    PubMed

    El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

    2012-01-01

    Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

  18. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  19. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  20. Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

    PubMed

    Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert

    2008-05-01

    Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

  1. Bioinformatics-based tools in drug discovery: the cartography from single gene to integrative biological networks.

    PubMed

    Ramharack, Pritika; Soliman, Mahmoud E S

    2018-06-01

    Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. PanWeb: A web interface for pan-genomic analysis.

    PubMed

    Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J

    2017-01-01

    With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.

  3. Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of "Escherichia coli" O157:H7 and "Yersinia pestis"

    ERIC Educational Resources Information Center

    Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.

    2012-01-01

    Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples…

  4. RNA-sequencing data analysis of uterus in ovariectomized rats fed with soy protein isolate,17B-estradiol and casein

    USDA-ARS?s Scientific Manuscript database

    This data file describes the bioinformatics analysis of uterine RNA-seq data comparing genome wide effects of feeding soy protein isolate compared to casein to ovariectomized female rats age 64 days relative to treatment of casein fed rats with 5 ug/kg/d estradiol and relative to rats treated with e...

  5. An Interspecies Comparative Analysis of the Predicted Secretomes of the Necrotrophic Plant Pathogens Sclerotinia sclerotiorum and Botrytis cinerea

    PubMed Central

    2015-01-01

    Phytopathogenic fungi form intimate associations with host plant species and cause disease. To be successful, fungal pathogens communicate with a susceptible host through the secretion of proteinaceous effectors, hydrolytic enzymes and metabolites. Sclerotinia sclerotiorum and Botrytis cinerea are economically important necrotrophic fungal pathogens that cause disease on numerous crop species. Here, a powerful bioinformatics pipeline was used to predict the refined S. sclerotiorum and B. cinerea secretomes, identifying 432 and 499 proteins respectively. Analyses focusing on S. sclerotiorum revealed that 16% of the secretome encoding genes resided in small, sequence heterogeneous, gene clusters that were distributed over 13 of the 16 predicted chromosomes. Functional analyses highlighted the importance of plant cell hydrolysis, oxidation-reduction processes and the redox state to the S. sclerotiorum and B. cinerea secretomes and potentially host infection. Only 8% of the predicted proteins were distinct between the two secretomes. In contrast to S. sclerotiorum, the B. cinerea secretome lacked CFEM- or LysM-containing proteins. The 115 fungal and oomycete genome comparison identified 30 proteins specific to S. sclerotiorum and B. cinerea, plus 11 proteins specific to S. sclerotiorum and 32 proteins specific to B. cinerea. Expressed sequence tag (EST) and proteomic analyses showed that 246 S. sclerotiorum secretome encoding genes had EST support, including 101 which were only expressed in vitro and 49 which were only expressed in planta, whilst 42 predicted proteins were experimentally proven to be secreted. These detailed in silico analyses of two important necrotrophic pathogens will permit informed choices to be made when candidate effector proteins are selected for function analyses in planta. PMID:26107498

  6. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  7. BioStar: an online question & answer resource for the bioinformatics community

    USDA-ARS?s Scientific Manuscript database

    Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...

  8. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  9. Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

    PubMed

    Adhikari, Utpal Kumar; Rahman, M Mizanur

    2017-04-01

    The nirk gene encoding the copper-containing nitrite reductase (CuNiR), a key catalytic enzyme in the environmental denitrification process that helps to produce nitric oxide from nitrite. The molecular mechanism of denitrification process is definitely complex and in this case a theoretical investigation has been conducted to know the sequence information and amino acid composition of the active site of CuNiR enzyme using various Bioinformatics tools. 10 Fasta formatted sequences were retrieved from the NCBI database and the domain and disordered regions identification and phylogenetic analyses were done on these sequences. The comparative modeling of protein was performed through Modeller 9v14 program and visualized by PyMOL tools. Validated protein models were deposited in the Protein Model Database (PMDB) (PMDB id: PM0080150 to PM0080159). Active sites of nirk encoding CuNiR enzyme were identified by Castp server. The PROCHECK showed significant scores for four protein models in the most favored regions of the Ramachandran plot. Active sites and cavities prediction exhibited that the amino acid, namely Glycine, Alanine, Histidine, Aspartic acid, Glutamic acid, Threonine, and Glutamine were common in four predicted protein models. The present in silico study anticipates that active site analyses result will pave the way for further research on the complex denitrification mechanism of the selected species in the experimental laboratory. Copyright © 2016. Published by Elsevier Ltd.

  10. Parallel comparative proteomics and phosphoproteomics reveal that cattle myostatin regulates phosphorylation of key enzymes in glycogen metabolism and glycolysis pathway

    PubMed Central

    Yang, Shuping; Li, Xin; Liu, Xinfeng; Ding, Xiangbin; Xin, Xiangbo; Jin, Congfei; Zhang, Sheng; Li, Guangpeng; Guo, Hong

    2018-01-01

    MSTN-encoded myostatin is a negative regulator of skeletal muscle development. Here, we utilized the gluteus tissues from MSTN gene editing and wild type Luxi beef cattle which are native breed of cattle in China, performed tandem mass tag (TMT) -based comparative proteomics and phosphoproteomics analyses to investigate the regulatory mechanism of MSTN related to cellular metabolism and signaling pathway in muscle development. Out of 1,315 proteins, 69 differentially expressed proteins (DEPs) were found in global proteomics analysis. Meanwhile, 149 differentially changed phosphopeptides corresponding to 76 unique phosphorylated proteins (DEPPs) were detected from 2,600 identified phosphopeptides in 702 phosphorylated proteins. Bioinformatics analyses suggested that majority of DEPs and DEPPs were closely related to glycolysis, glycogenolysis, and muscle contractile fibre processes. The global discovery results were validated by Multiple Reaction Monitoring (MRM)-based targeted peptide quantitation analysis, western blotting, and muscle glycogen content measurement. Our data revealed that increase in abundance of key enzymes and phosphorylation on their regulatory sites appears responsible for the enhanced glycogenolysis and glycolysis in MSTN−/−. The elevated glycogenolysis was assocaited with an enhanced phosphorylation of Ser1018 in PHKA1, and Ser641/Ser645 in GYS1, which were regulated by upstream phosphorylated AKT-GSK3β pathway and highly consistent with the lower glycogen content in gluteus of MSTN−/−. Collectively, this study provides new insights into the regulatory mechanisms of MSTN involved in energy metabolism and muscle growth. PMID:29541418

  11. Dramatic expansion of the black widow toxin arsenal uncovered by multi-tissue transcriptomics and venom proteomics.

    PubMed

    Haney, Robert A; Ayoub, Nadia A; Clarke, Thomas H; Hayashi, Cheryl Y; Garb, Jessica E

    2014-06-11

    Animal venoms attract enormous interest given their potential for pharmacological discovery and understanding the evolution of natural chemistries. Next-generation transcriptomics and proteomics provide unparalleled, but underexploited, capabilities for venom characterization. We combined multi-tissue RNA-Seq with mass spectrometry and bioinformatic analyses to determine venom gland specific transcripts and venom proteins from the Western black widow spider (Latrodectus hesperus) and investigated their evolution. We estimated expression of 97,217 L. hesperus transcripts in venom glands relative to silk and cephalothorax tissues. We identified 695 venom gland specific transcripts (VSTs), many of which BLAST and GO term analyses indicate may function as toxins or their delivery agents. ~38% of VSTs had BLAST hits, including latrotoxins, inhibitor cystine knot toxins, CRISPs, hyaluronidases, chitinase, and proteases, and 59% of VSTs had predicted protein domains. Latrotoxins are venom toxins that cause massive neurotransmitter release from vertebrate or invertebrate neurons. We discovered ≥ 20 divergent latrotoxin paralogs expressed in L. hesperus venom glands, significantly increasing this biomedically important family. Mass spectrometry of L. hesperus venom identified 49 proteins from VSTs, 24 of which BLAST to toxins. Phylogenetic analyses showed venom gland specific gene family expansions and shifts in tissue expression. Quantitative expression analyses comparing multiple tissues are necessary to identify venom gland specific transcripts. We present a black widow venom specific exome that uncovers a trove of diverse toxins and associated proteins, suggesting a dynamic evolutionary history. This justifies a reevaluation of the functional activities of black widow venom in light of its emerging complexity.

  12. ClonoCalc and ClonoPlot: immune repertoire analysis from raw files to publication figures with graphical user interface.

    PubMed

    Fähnrich, Anke; Krebbel, Moritz; Decker, Normann; Leucker, Martin; Lange, Felix D; Kalies, Kathrin; Möller, Steffen

    2017-03-11

    Next generation sequencing (NGS) technologies enable studies and analyses of the diversity of both T and B cell receptors (TCR and BCR) in human and animal systems to elucidate immune functions in health and disease. Over the last few years, several algorithms and tools have been developed to support respective analyses of raw sequencing data of the immune repertoire. These tools focus on distinct aspects of the data processing and require a strong bioinformatics background. To facilitate the analysis of T and B cell repertoires by less experienced users, software is needed that combines the most common tools for repertoire analysis. We introduce a graphical user interface (GUI) providing a complete analysis pipeline for processing raw NGS data for human and animal TCR and BCR clonotype determination and advanced differential repertoire studies. It provides two applications. ClonoCalc prepares the raw data for downstream analyses. It combines a demultiplexer for barcode splitting and employs MiXCR for paired-end read merging and the extraction of human and animal TCR/BCR sequences. ClonoPlot wraps the R package tcR and further contributes self-developed plots for the descriptive comparative investigation of immune repertoires. This workflow reduces the amount of programming required to perform the respective analyses and supports both communication and training between scientists and technicians, and across scientific disciplines. The Open Source development in Java and R is modular and invites advanced users to extend its functionality. Software and documentation are freely available at https://bitbucket.org/ClonoSuite/clonocalc-plot .

  13. Novel SINEs families in Medicago truncatula and Lotus japonicus: bioinformatic analysis.

    PubMed

    Gadzalski, Marek; Sakowicz, Tomasz

    2011-07-01

    Although short interspersed elements (SINEs) were discovered nearly 30 years ago, the studies of these genomic repeats were mostly limited to animal genomes. Very little is known about SINEs in legumes--one of the most important plant families. Here we report identification, genomic distribution and molecular features of six novel SINE elements in Lotus japonicus (named LJ_SINE-1, -2, -3) and Medicago truncatula (MT_SINE-1, -2, -3), model species of legume. They possess all the structural features commonly found in short interspersed elements including RNA polymerase III promoter, polyA tail and flanking repeats. SINEs described here are present in low to moderate copy numbers from 150 to 3000. Bioinformatic analyses were used to searched public databases, we have shown that three of new SINE elements from M. truncatula seem to be characteristic of Medicago and Trifolium genera. Two SINE families have been found in L. japonicus and one is present in both M. truncatula and L. japonicus. In addition, we are discussing potential activities of the described elements. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Antimicrobial activity and mechanism of the human milk-sourced peptide Casein201

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fan; Department of Endocrinology, Children's Hospital of Nanjing Medical University, Nanjing; Cui, Xianwei

    Introduction: Casein201 is one of the human milk sourced peptides that differed significantly in preterm and full-term mothers. This study is designed to demonstrate the biological characteristics, antibacterial activity and mechanisms of Casein201 against common pathogens in neonatal infection. Methodology: The analysis of biological characteristics was done by bioinformatics. Disk diffusion method and flow cytometry were used to detect the antimicrobial activity of Casein201. Killing kinetics of Casein201 was measured using microplate reader. The antimicrobial mechanism of Casein201 was studied by electron microscopy and electrophoresis. Results: Bioinformatics analysis indicates that Casein201 derived from β-casein and showed significant sequence overlap. Antibacterialmore » assays showed Casein201 inhibited the growth of S taphylococcus aureus and Y ersinia enterocolitica. Ultrastructural analyses revealed that the antibacterial activity of Casein201 is through cytoplasmic structures disintegration and bacterial cell envelope alterations but not combination with DNA. Conclusion: We conclude the antimicrobial activity and mechanism of Casein201. Our data demonstrate that Casein201 has potential therapeutic value for the prevention and treatment of pathogens in neonatal infection.« less

  15. Antimicrobial activity and mechanism of the human milk-sourced peptide Casein201.

    PubMed

    Zhang, Fan; Cui, Xianwei; Fu, Yanrong; Zhang, Jun; Zhou, Yahui; Sun, Yazhou; Wang, Xing; Li, Yun; Liu, Qianqi; Chen, Ting

    2017-04-08

    Casein201 is one of the human milk sourced peptides that differed significantly in preterm and full-term mothers. This study is designed to demonstrate the biological characteristics, antibacterial activity and mechanisms of Casein201 against common pathogens in neonatal infection. The analysis of biological characteristics was done by bioinformatics. Disk diffusion method and flow cytometry were used to detect the antimicrobial activity of Casein201. Killing kinetics of Casein201 was measured using microplate reader. The antimicrobial mechanism of Casein201 was studied by electron microscopy and electrophoresis. Bioinformatics analysis indicates that Casein201 derived from β-casein and showed significant sequence overlap. Antibacterial assays showed Casein201 inhibited the growth of S taphylococcus aureus and Y ersinia enterocolitica. Ultrastructural analyses revealed that the antibacterial activity of Casein201 is through cytoplasmic structures disintegration and bacterial cell envelope alterations but not combination with DNA. We conclude the antimicrobial activity and mechanism of Casein201. Our data demonstrate that Casein201 has potential therapeutic value for the prevention and treatment of pathogens in neonatal infection. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Charged groups at binding interfaces of the PsbO subunit of photosystem II: A combined bioinformatics and simulation study.

    PubMed

    Del Val, Coral; Bondar, Ana-Nicoleta

    2017-06-01

    PsbO is an extrinsic subunit of photosystem II engaged in complex binding interactions within photosystem II. At the interface between PsbO, D1 and D2 subunits of photosystem II, a cluster of charged and polar groups of PsbO is part of an extended hydrogen-bond network thought to participate in proton transfer. The precise role of specific amino acid residues at this complex binding interface remains a key open question. Here, we address this question by carrying out extensive bioinformatics analyses and molecular dynamics simulations of PsbO proteins with mutations at the binding interface. We find that PsbO proteins from cyanobacteria vs. plants have specific preferences for the number and composition of charged amino acid residues that may ensure that PsbO proteins avoid aggregation and expose long unstructured loops for binding to photosystem II. A cluster of conserved charged groups with dynamic hydrogen bonds provides PsbO with structural plasticity at the binding interface with photosystem II. Copyright © 2017. Published by Elsevier B.V.

  17. The role of proteosome-mediated proteolysis in modulating potentially harmful transcription factor activity in Saccharomyces cerevisiae

    PubMed Central

    Bonzanni, Nicola; Zhang, Nianshu; Oliver, Stephen G.; Fisher, Jasmin

    2011-01-01

    Motivation: The appropriate modulation of the stress response to variable environmental conditions is necessary to maintain sustained viability in Saccharomyces cerevisiae. Particularly, controlling the abundance of proteins that may have detrimental effects on cell growth is crucial for rapid recovery from stress-induced quiescence. Results: Prompted by qualitative modeling of the nutrient starvation response in yeast, we investigated in vivo the effect of proteolysis after nutrient starvation showing that, for the Gis1 transcription factor at least, proteasome-mediated control is crucial for a rapid return to growth. Additional bioinformatics analyses show that potentially toxic transcriptional regulators have a significantly lower protein half-life, a higher fraction of unstructured regions and more potential PEST motifs than the non-detrimental ones. Furthermore, inhibiting proteasome activity tends to increase the expression of genes induced during the Environmental Stress Response more than those in the rest of the genome. Our combined results suggest that proteasome-mediated proteolysis of potentially toxic transcription factors tightly modulates the stress response in yeast. Contact: jasmin.fisher@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21685082

  18. Isothiocyanates from Broccolini seeds induce apoptosis in human colon cancer cells: proteomic and bioinformatic analyses.

    PubMed

    Yang, Yanjing; Yan, Huidan; Li, Yuqin; Yang, Shang-Tian; Zhang, Xuewu

    2011-05-01

    Isothiocyanates (ITCs) have been shown to possess antitumor activity in colon cancer, however, the detailed mechanism is still unclear. The objective of this study was to investigate apoptosis-inducing activity of ITCs from Broccolini seeds and proteomic changes in SW480 cells, and to identify the molecular pathways responsible for the anticancer action of ITCs. We found that ITCs induces SW480 cells apoptosis in a dose-dependent manner by using MTT assay, phase contrast microscope and flow cytometry, and the IC50 was calculated to be 77.72 microg/ml, superior to the chemotherapeutical drug 5-flurouracil. Subsequently, 15 altered proteins in ITCs treated SW480 cells were identified. Further bioinformatics analysis predicted the potential pathways for ITCs to induce apoptosis of SW480 cells. In conclusion, this is the first report to investigate anticancer activity of ITCs from Broccolini seeds and its mechanism of action by proteomics analysis. Our observations provide potential therapeutic targets for colon cancer inhibitor intervention and implicate the development of novel anti-cancer therapeutic strategies.

  19. Using bio.tools to generate and annotate workbench tool descriptions

    PubMed Central

    Hillion, Kenzo-Hugo; Kuzmin, Ivan; Khodak, Anton; Rasche, Eric; Crusoe, Michael; Peterson, Hedi; Ison, Jon; Ménager, Hervé

    2017-01-01

    Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata. PMID:29333231

  20. Graphics processing units in bioinformatics, computational biology and systems biology.

    PubMed

    Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

    2017-09-01

    Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.

  1. Profiling and bioinformatics analyses reveal differential circular RNA expression in radioresistant esophageal cancer cells.

    PubMed

    Su, Huafang; Lin, Fuqiang; Deng, Xia; Shen, Lanxiao; Fang, Ya; Fei, Zhenghua; Zhao, Lihao; Zhang, Xuebang; Pan, Huanle; Xie, Deyao; Jin, Xiance; Xie, Congying

    2016-07-28

    Acquired radioresistance during radiotherapy is considered as the most important reason for local tumor recurrence or treatment failure. Circular RNAs (circRNAs) have recently been identified as microRNA sponges and involve in various biological processes. The purpose of this study is to investigate the role of circRNAs in the radioresistance of esophageal cancer. Total RNA was isolated from human parental cell line KYSE-150 and self-established radioresistant esophageal cancer cell line KYSE-150R, and hybridized to Arraystar Human circRNA Array. Quantitative real-time PCR was used to confirm the circRNA expression profiles obtained from the microarray data. Bioinformatic tools including gene ontology (GO) analysis, KEGG pathway analysis and network analysis were done for further assessment. Among the detected candidate 3752 circRNA genes, significant upregulation of 57 circRNAs and downregulation of 17 circRNAs in human radioresistant esophageal cancer cell line KYSE-150R were observed compared with the parental cell line KYSE-150 (fold change ≥2.0 and P < 0.05). There were 9 out of these candidate circRNAs were validated by real-time PCR. GO analysis revealed that numerous target genes, including most microRNAs were involved in the biological processes. There were more than 400 target genes enrichment on Wnt signaling pathway. CircRNA_001059 and circRNA_000167 were the two largest nodes in circRNA/microRNA co-expression network. Our study revealed a comprehensive expression and functional profile of differentially expressed circRNAs in radioresistant esophageal cancer cells, indicating possible involvement of these dysregulated circRNAs in the development of radiation resistance.

  2. Task 1.5 Genomic Shift and Drift Trends of Emerging Pathogens

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borucki, M

    2010-01-05

    The Lawrence Livermore National Laboratory (LLNL) Bioinformatics group has recently taken on a role in DTRA's Transformation Medical Technologies Initiative (TMTI). The high-level goal of TMTI is to accelerate the development of broad-spectrum countermeasures. To achieve those goals, TMTI has a near term need to conduct analyses of genomic shift and drift trends of emerging pathogens, with a focused eye on select agent pathogens, as well as antibiotic and virulence markers. Most emerging human pathogens are zoonotic viruses with a genome composed of RNA. The high mutation rate of the replication enzymes of RNA viruses contributes to sequence drift andmore » provides one mechanism for these viruses to adapt to diverse hosts (interspecies transmission events) and cause new human and zoonotic diseases. Additionally, new viral pathogens frequently emerge due to genetic shift (recombination and segment reassortment) which allows for dramatic genotypic and phenotypic changes to occur rapidly. Bacterial pathogens also evolve via genetic drift and shift, although sequence drift generally occurs at a much slower rate for bacteria as compared to RNA viruses. However, genetic shift such as lateral gene transfer and inter- and intragenomic recombination enables bacteria to rapidly acquire new mechanisms of survival and antibiotic resistance. New technologies such as rapid whole genome sequencing of bacterial genomes, ultra-deep sequencing of RNA virus populations, metagenomic studies of environments rich in antibiotic resistance genes, and the use of microarrays for the detection and characterization of emerging pathogens provide mechanisms to address the challenges posed by the rapid emergence of pathogens. Bioinformatic algorithms that enable efficient analysis of the massive amounts of data generated by these technologies as well computational modeling of protein structures and evolutionary processes need to be developed to allow the technology to fulfill its potential.« less

  3. Use of Free/Libre Open Source Software in Sepsis "-Omics" Research: A Bibliometric, Comparative Analysis Among the United States, EU-28 Member States, and China.

    PubMed

    Evangelatos, Nikolaos; Satyamourthy, Kapaettu; Levidou, Georgia; Brand, Helmut; Bauer, Pia; Kouskouti, Christina; Brand, Angela

    2018-05-01

    "-Omics" systems sciences are at the epicenter of personalized medicine and public health, and drivers of knowledge-based biotechnology innovation. Bioinformatics, a core component of omics research, is one of the disciplines that first employed Free/Libre Open Source Software (FLOSS), and thus provided a fertile ground for its further development. Understanding the use and characteristics of FLOSS deployed in the omics field is valuable for future innovation strategies, policy and funding priorities. We conducted a bibliometric, longitudinal study of the use of FLOSS in sepsis omics research from 2011 to 2015 in the United States, EU-28 and China. Because sepsis is an interdisciplinary field at the intersection of multiple omics technologies and medical specialties, it was chosen as a model innovation ecosystem for this empirical analysis, which used publicly available data. Despite development of and competition from proprietary commercial software, scholars in omics continue to employ FLOSS routinely, and independent of the type of omics technology they work with. The number of articles using FLOSS increased significantly over time in the EU-28, as opposed to the United States and China (R = 0.96, p = 0.004). Furthermore, in an era where sharing of knowledge is being strongly advocated and promoted by public agencies and social institutions, we discuss possible correlations between the use of FLOSS and various funding sources in omics research. These observations and analyses provide new insights into the use of FLOSS in sepsis omics research across three (supra)national regions. Further benchmarking studies are warranted for FLOSS trends in other omics fields and geographical settings. These could, in time, lead to the development of new composite innovation and technology use metrics in omics systems sciences and bioinformatics communities.

  4. Expression characteristics of long noncoding RNA uc.322 and its effects on pancreatic islet function.

    PubMed

    Zhao, Xiaoqin; Rong, Can; Pan, Fenghui; Xiang, Lizhi; Wang, Xinlei; Hu, Yun

    2018-06-28

    Increasing evidence indicates that long noncoding RNAs (lncRNAs) perform special biological functions by regulating gene expression through multiple pathways and molecular mechanisms. The aim of this study was to explore the expression characteristics of lncRNA uc.322 in pancreatic islet cells and its effects on the secretion function of islet cells. Bioinformatics analysis was used to detect the lncRNA uc.322 sequence, location, and structural features. Expression of lncRNA uc.322 in different tissues was detected by quantitative polymerase chain reaction analyses. Quantitative polymerase chain reaction, Western blot analysis, adenosine triphosphate determination, glucose-stimulated insulin secretion, and enzyme-linked immunosorbent assay were used to evaluate the effects of lncRNA uc.322 on insulin secretion. The results showed that the full-length of lncRNA uc.322 is 224 bp and that it is highly conserved in various species. Bioinformatics analysis revealed that lncRNA uc.322 is located on chr7:122893196-122893419 (GRCH37/hg19) within the SRY-related HMG-box 6 gene exon region. Compared with other tissues, lncRNA uc.322 is highly expressed in pancreatic tissue. Upregulation of lncRNA uc.322 expression increases the insulin transcription factors pancreatic and duodenal homeobox 1 and Forkhead box O1 expression, promotes insulin secretion in the extracellular fluid of Min6 cells, and increases the adenosine triphosphate concentration. On the other hand, knockdown of lncRNA uc.322 has opposite effects on Min6 cells. Overall, this study showed that upregulation of lncRNA uc.322 in islet β-cells can increase the expression of insulin transcription factors and promote insulin secretion, and it may be a new therapeutic target for diabetes. © 2018 Wiley Periodicals, Inc.

  5. Comparative bioinformatics, temporal and spatial expression analyses of Ixodes scapularis organic anion transporting polypeptides

    PubMed Central

    Radulović, Željko; Porter, Lindsay M.; Kim, Tae K.; Mulenga, Albert

    2015-01-01

    Organic anion-transporting polypeptides (Oatps) are an integral part of the detoxification mechanism in vertebrates and invertebrates. These cell surface proteins are involved in mediating the sodium-independent uptake and/or distribution of a broad array of organic amphipathic compounds and xenobiotic drugs. This study describes bioinformatics and biological characterization of 9 Oatp sequences in the Ixodes scapularis genome. These sequences have been annotated on the basis of 12 transmembrane domains, consensus motif D-X-RW-(I,V)-GAWW-X-G-(F,L)-L, and 11 conserved cysteine amino acid residues in the large extracellular loop 5 that characterize the Oatp superfamily. Ixodes scapularis Oatps may regulate non-redundant cross-tick species conserved functions in that they did not cluster as a monolithic group on the phylogeny tree and that they have orthologs in other ticks. Phylogeny clustering patterns also suggest that some tick Oatp sequences transport substrates that are similar to those of body louse, mosquito, eye worm, and filarial worm Oatps. Semi-quantitative RT-PCR analysis demonstrated that all 9 I. scapularis Oatp sequences were expressed during tick feeding. Ixodes scapularis Oatp genes potentially regulate functions during early and/or late-stage tick feeding as revealed by normalized mRNA profiles. Normalized transcript abundance indicates that I. scapularis Oatp genes are strongly expressed in unfed ticks during the first 24 h of feeding and/or at the end of the tick feeding process. Except for 2 I. scapularis Oatps, which were expressed in the salivary glands and ovaries, all other genes were expressed in all tested organs, suggesting the significance of I. scapularis Oatps in maintaining tick homeostasis. Different I. scapularis Oatp mRNA expression patterns were detected and discussed with reference to different physiological states of unfed and feeding ticks. PMID:24582512

  6. Glimpsing over the event horizon: evolution of nuclear pores and envelope.

    PubMed

    Jékely, Gáspár

    2005-02-01

    The origin of eukaryotes from prokaryotic ancestors is one of the major evolutionary transitions in the history of life. The nucleus, a membrane bound compartment for confining the genome, is a central feature of eukaryotic cells and its origin also has to be a central feature of any workable theory that ventures to explain eukaryotic origins. Recent bioinformatic analyses of components of the nuclear pore complex (NPC), the nuclear envelope (NE), and the nuclear transport systems revealed exciting evolutionary connections (e.g., between NPC and coated vesicles) and provided a useful record of the phyletic distribution and history of NPC and NE components. These analyses allow us to refine theories on the origin and evolution of the nucleus, and consequently, of the eukaryotic cell.

  7. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  8. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  9. Semi-quantitative proteomics of mammalian cells upon short-term exposure to non-ionizing electromagnetic fields.

    PubMed

    Kuzniar, Arnold; Laffeber, Charlie; Eppink, Berina; Bezstarosti, Karel; Dekkers, Dick; Woelders, Henri; Zwamborn, A Peter M; Demmers, Jeroen; Lebbink, Joyce H G; Kanaar, Roland

    2017-01-01

    The potential effects of non-ionizing electromagnetic fields (EMFs), such as those emitted by power-lines (in extremely low frequency range), mobile cellular systems and wireless networking devices (in radio frequency range) on human health have been intensively researched and debated. However, how exposure to these EMFs may lead to biological changes underlying possible health effects is still unclear. To reveal EMF-induced molecular changes, unbiased experiments (without a priori focusing on specific biological processes) with sensitive readouts are required. We present the first proteome-wide semi-quantitative mass spectrometry analysis of human fibroblasts, osteosarcomas and mouse embryonic stem cells exposed to three types of non-ionizing EMFs (ELF 50 Hz, UMTS 2.1 GHz and WiFi 5.8 GHz). We performed controlled in vitro EMF exposures of metabolically labeled mammalian cells followed by reliable statistical analyses of differential protein- and pathway-level regulations using an array of established bioinformatics methods. Our results indicate that less than 1% of the quantitated human or mouse proteome responds to the EMFs by small changes in protein abundance. Further network-based analysis of the differentially regulated proteins did not detect significantly perturbed cellular processes or pathways in human and mouse cells in response to ELF, UMTS or WiFi exposure. In conclusion, our extensive bioinformatics analyses of semi-quantitative mass spectrometry data do not support the notion that the short-time exposures to non-ionizing EMFs have a consistent biologically significant bearing on mammalian cells in culture.

  10. Structural, Bioinformatic, and In Vivo Analyses of Two Treponema pallidum Lipoproteins Reveal a Unique TRAP Transporter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deka, Ranjit K.; Brautigam, Chad A.; Goldberg, Martin

    2012-05-25

    Treponema pallidum, the bacterial agent of syphilis, is predicted to encode one tripartite ATP-independent periplasmic transporter (TRAP-T). TRAP-Ts typically employ a periplasmic substrate-binding protein (SBP) to deliver the cognate ligand to the transmembrane symporter. Herein, we demonstrate that the genes encoding the putative TRAP-T components from T. pallidum, tp0957 (the SBP), and tp0958 (the symporter), are in an operon with an uncharacterized third gene, tp0956. We determined the crystal structure of recombinant Tp0956; the protein is trimeric and perforated by a pore. Part of Tp0956 forms an assembly similar to those of 'tetratricopeptide repeat' (TPR) motifs. The crystal structure ofmore » recombinant Tp0957 was also determined; like the SBPs of other TRAP-Ts, there are two lobes separated by a cleft. In these other SBPs, the cleft binds a negatively charged ligand. However, the cleft of Tp0957 has a strikingly hydrophobic chemical composition, indicating that its ligand may be substantially different and likely hydrophobic. Analytical ultracentrifugation of the recombinant versions of Tp0956 and Tp0957 established that these proteins associate avidly. This unprecedented interaction was confirmed for the native molecules using in vivo cross-linking experiments. Finally, bioinformatic analyses suggested that this transporter exemplifies a new subfamily of TPATs (TPR-protein-associated TRAP-Ts) that require the action of a TPR-containing accessory protein for the periplasmic transport of a potentially hydrophobic ligand(s).« less

  11. Secretome analysis of rat osteoblasts during icariin treatment induced osteogenesis

    PubMed Central

    Qian, Weiqing; Su, Yan; Zhang, Yajie; Yao, Nianwei; Gu, Nin; Zhang, Xu; Yin, Hong

    2018-01-01

    Osteoporosis is a serious public health problem and icariin (ICA) is the active component of the Epimedium sagittatum, a traditional Chinese medicinal herb. The present study aimed to investigate the effects and underlying mechanisms of ICA as a potential therapy for osteoporosis. Calvaria osteoblasts were isolated from newborn rats and treated with ICA. Cell viability, apoptosis, alkaline phosphatase activity and calcium deposition were analyzed. Bioinformatics analyses were performed to identify differentially expressed proteins (DEPs) in response to ICA treatment. Western blot analysis was performed to validate the expression of DEPs. ICA administration promoted osteoblast viability, alkaline phosphatase activity, calcium deposition and inhibited osteoblast apoptosis. Secretome analysis of ICA-treated cells was performed using two-dimensional gel electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. A total of 56 DEPs were identified, including serpin family F member 1 (PEDF), protein disulfide isomerase family A, member 3 (PDIA3), nuclear protein, co-activator of histone transcription (NPAT), c-Myc and heat shock protein 70 (HSP70). These proteins were associated with signaling pathways, including Fas and p53. Bioinformatics and western blot analyses confirmed that the expression levels of the six DEPs were upregulated following ICA treatment. These genes may be directly or indirectly involved in ICA-mediated osteogenic differentiation and osteogenesis. It was demonstrated that ICA treatment promoted osteogenesis by modulating the expression of PEDF, PDIA3, NPAT and HSP70 through signaling pathways, including Fas and p53. PMID:29532868

  12. A systems biology approach toward understanding seed composition in soybean.

    PubMed

    Li, Ling; Hur, Manhoi; Lee, Joon-Yong; Zhou, Wenxu; Song, Zhihong; Ransom, Nick; Demirkale, Cumhur Yusuf; Nettleton, Dan; Westgate, Mark; Arendsee, Zebulun; Iyer, Vidya; Shanks, Jackie; Nikolau, Basil; Wurtele, Eve Syrkin

    2015-01-01

    The molecular, biochemical, and genetic mechanisms that regulate the complex metabolic network of soybean seed development determine the ultimate balance of protein, lipid, and carbohydrate stored in the mature seed. Many of the genes and metabolites that participate in seed metabolism are unknown or poorly defined; even more remains to be understood about the regulation of their metabolic networks. A global omics analysis can provide insights into the regulation of seed metabolism, even without a priori assumptions about the structure of these networks. With the future goal of predictive biology in mind, we have combined metabolomics, transcriptomics, and metabolic flux technologies to reveal the global developmental and metabolic networks that determine the structure and composition of the mature soybean seed. We have coupled this global approach with interactive bioinformatics and statistical analyses to gain insights into the biochemical programs that determine soybean seed composition. For this purpose, we used Plant/Eukaryotic and Microbial Metabolomics Systems Resource (PMR, http://www.metnetdb.org/pmr, a platform that incorporates metabolomics data to develop hypotheses concerning the organization and regulation of metabolic networks, and MetNet systems biology tools http://www.metnetdb.org for plant omics data, a framework to enable interactive visualization of metabolic and regulatory networks. This combination of high-throughput experimental data and bioinformatics analyses has revealed sets of specific genes, genetic perturbations and mechanisms, and metabolic changes that are associated with the developmental variation in soybean seed composition. Researchers can explore these metabolomics and transcriptomics data interactively at PMR.

  13. Proteomic Interaction Patterns between Human Cyclins, the Cyclin-Dependent Kinase Ortholog pUL97 and Additional Cytomegalovirus Proteins

    PubMed Central

    Steingruber, Mirjam; Kraut, Alexandra; Socher, Eileen; Sticht, Heinrich; Reichel, Anna; Stamminger, Thomas; Amin, Bushra; Couté, Yohann; Hutterer, Corina; Marschall, Manfred

    2016-01-01

    The human cytomegalovirus (HCMV)-encoded cyclin-dependent kinase (CDK) ortholog pUL97 associates with human cyclin B1 and other types of cyclins. Here, the question was addressed whether cyclin interaction of pUL97 and additional viral proteins is detectable by mass spectrometry-based approaches. Proteomic data were validated by coimmunoprecipitation (CoIP), Western blot, in vitro kinase and bioinformatic analyses. Our findings suggest that: (i) pUL97 shows differential affinities to human cyclins; (ii) pUL97 inhibitor maribavir (MBV) disrupts the interaction with cyclin B1, but not with other cyclin types; (iii) cyclin H is identified as a new high-affinity interactor of pUL97 in HCMV-infected cells; (iv) even more viral phosphoproteins, including all known substrates of pUL97, are detectable in the cyclin-associated complexes; and (v) a first functional validation of pUL97-cyclin B1 interaction, analyzed by in vitro kinase assay, points to a cyclin-mediated modulation of pUL97 substrate preference. In addition, our bioinformatic analyses suggest individual, cyclin-specific binding interfaces for pUL97-cyclin interaction, which could explain the different strengths of interactions and the selective inhibitory effect of MBV on pUL97-cyclin B1 interaction. Combined, the detection of cyclin-associated proteins in HCMV-infected cells suggests a complex pattern of substrate phosphorylation and a role of cyclins in the fine-modulation of pUL97 activities. PMID:27548200

  14. Semi-quantitative proteomics of mammalian cells upon short-term exposure to non-ionizing electromagnetic fields

    PubMed Central

    Laffeber, Charlie; Eppink, Berina; Bezstarosti, Karel; Dekkers, Dick; Woelders, Henri; Zwamborn, A. Peter M.; Demmers, Jeroen; Lebbink, Joyce H. G.; Kanaar, Roland

    2017-01-01

    The potential effects of non-ionizing electromagnetic fields (EMFs), such as those emitted by power-lines (in extremely low frequency range), mobile cellular systems and wireless networking devices (in radio frequency range) on human health have been intensively researched and debated. However, how exposure to these EMFs may lead to biological changes underlying possible health effects is still unclear. To reveal EMF-induced molecular changes, unbiased experiments (without a priori focusing on specific biological processes) with sensitive readouts are required. We present the first proteome-wide semi-quantitative mass spectrometry analysis of human fibroblasts, osteosarcomas and mouse embryonic stem cells exposed to three types of non-ionizing EMFs (ELF 50 Hz, UMTS 2.1 GHz and WiFi 5.8 GHz). We performed controlled in vitro EMF exposures of metabolically labeled mammalian cells followed by reliable statistical analyses of differential protein- and pathway-level regulations using an array of established bioinformatics methods. Our results indicate that less than 1% of the quantitated human or mouse proteome responds to the EMFs by small changes in protein abundance. Further network-based analysis of the differentially regulated proteins did not detect significantly perturbed cellular processes or pathways in human and mouse cells in response to ELF, UMTS or WiFi exposure. In conclusion, our extensive bioinformatics analyses of semi-quantitative mass spectrometry data do not support the notion that the short-time exposures to non-ionizing EMFs have a consistent biologically significant bearing on mammalian cells in culture. PMID:28234898

  15. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.

    PubMed

    Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B

    2015-01-01

    Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.

  16. Composition of the mitochondrial electron transport chain in acanthamoeba castellanii: structural and evolutionary insights.

    PubMed

    Gawryluk, Ryan M R; Chisholm, Kenneth A; Pinto, Devanand M; Gray, Michael W

    2012-11-01

    The mitochondrion, derived in evolution from an α-proteobacterial progenitor, plays a key metabolic role in eukaryotes. Mitochondria house the electron transport chain (ETC) that couples oxidation of organic substrates and electron transfer to proton pumping and synthesis of ATP. The ETC comprises several multiprotein enzyme complexes, all of which have counterparts in bacteria. However, mitochondrial ETC assemblies from animals, plants and fungi are generally more complex than their bacterial counterparts, with a number of 'supernumerary' subunits appearing early in eukaryotic evolution. Little is known, however, about the ETC of unicellular eukaryotes (protists), which are key to understanding the evolution of mitochondria and the ETC. We present an analysis of the ETC proteome from Acanthamoeba castellanii, an ecologically, medically and evolutionarily important member of Amoebozoa (sister to Opisthokonta). Data obtained from tandem mass spectrometric (MS/MS) analyses of purified mitochondria as well as ETC complexes isolated via blue native polyacrylamide gel electrophoresis are combined with the results of bioinformatic queries of sequence databases. Our bioinformatic analyses have identified most of the ETC subunits found in other eukaryotes, confirming and extending previous observations. The assignment of proteins as ETC subunits by MS/MS provides important insights into the primary structures of ETC proteins and makes possible, through the use of sensitive profile-based similarity searches, the identification of novel constituents of the ETC along with the annotation of highly divergent but phylogenetically conserved ETC subunits. © 2012 Elsevier B.V. All rights reserved.

  17. Autonomous Metabolomics for Rapid Metabolite Identification in Global Profiling

    DOE PAGES

    Benton, H. Paul; Ivanisevic, Julijana; Mahieu, Nathaniel G.; ...

    2014-12-12

    An autonomous metabolomic workflow combining mass spectrometry analysis with tandem mass spectrometry data acquisition was designed to allow for simultaneous data processing and metabolite characterization. Although previously tandem mass spectrometry data have been generated on the fly, the experiments described herein combine this technology with the bioinformatic resources of XCMS and METLIN. We can analyze large profiling datasets and simultaneously obtain structural identifications, as a result of this unique integration. Furthermore, validation of the workflow on bacterial samples allowed the profiling on the order of a thousand metabolite features with simultaneous tandem mass spectra data acquisition. The tandem mass spectrometrymore » data acquisition enabled automatic search and matching against the METLIN tandem mass spectrometry database, shortening the current workflow from days to hours. Overall, the autonomous approach to untargeted metabolomics provides an efficient means of metabolomic profiling, and will ultimately allow the more rapid integration of comparative analyses, metabolite identification, and data analysis at a systems biology level.« less

  18. RhoA Regulation of Cardiomyocyte Differentiation

    PubMed Central

    Kaarbø, Mari; Crane, Denis I.; Murrell, Wayne G.

    2013-01-01

    Earlier findings from our laboratory implicated RhoA in heart developmental processes. To investigate factors that potentially regulate RhoA expression, RhoA gene organisation and promoter activity were analysed. Comparative analysis indicated strict conservation of both gene organisation and coding sequence of the chick, mouse, and human RhoA genes. Bioinformatics analysis of the derived promoter region of mouse RhoA identified putative consensus sequence binding sites for several transcription factors involved in heart formation and organogenesis generally. Using luciferase reporter assays, RhoA promoter activity was shown to increase in mouse-derived P19CL6 cells that were induced to differentiate into cardiomyocytes. Overexpression of a dominant negative mutant of mouse RhoA (mRhoAN19) blocked this cardiomyocyte differentiation of P19CL6 cells and led to the accumulation of the cardiac transcription factors SRF and GATA4 and the early cardiac marker cardiac α-actin. Taken together, these findings indicate a fundamental role for RhoA in the differentiation of cardiomyocytes. PMID:23935420

  19. Relating genes to function: identifying enriched transcription factors using the ENCODE ChIP-Seq significance tool.

    PubMed

    Auerbach, Raymond K; Chen, Bin; Butte, Atul J

    2013-08-01

    Biological analysis has shifted from identifying genes and transcripts to mapping these genes and transcripts to biological functions. The ENCODE Project has generated hundreds of ChIP-Seq experiments spanning multiple transcription factors and cell lines for public use, but tools for a biomedical scientist to analyze these data are either non-existent or tailored to narrow biological questions. We present the ENCODE ChIP-Seq Significance Tool, a flexible web application leveraging public ENCODE data to identify enriched transcription factors in a gene or transcript list for comparative analyses. The ENCODE ChIP-Seq Significance Tool is written in JavaScript on the client side and has been tested on Google Chrome, Apple Safari and Mozilla Firefox browsers. Server-side scripts are written in PHP and leverage R and a MySQL database. The tool is available at http://encodeqt.stanford.edu. abutte@stanford.edu Supplementary material is available at Bioinformatics online.

  20. Quantitative Proteomics Analysis of Streptomyces coelicolor Development Demonstrates That Onset of Secondary Metabolism Coincides with Hypha Differentiation*

    PubMed Central

    Manteca, Angel; Sanchez, Jesus; Jung, Hye R.; Schwämmle, Veit; Jensen, Ole N.

    2010-01-01

    Streptomyces species produce many clinically important secondary metabolites, including antibiotics and antitumorals. They have a complex developmental cycle, including programmed cell death phenomena, that makes this bacterium a multicellular prokaryotic model. There are two differentiated mycelial stages: an early compartmentalized vegetative mycelium (first mycelium) and a multinucleated reproductive mycelium (second mycelium) arising after programmed cell death processes. In the present study, we made a detailed proteomics analysis of the distinct developmental stages of solid confluent Streptomyces coelicolor cultures using iTRAQ (isobaric tags for relative and absolute quantitation) labeling and LC-MS/MS. A new experimental approach was developed to obtain homogeneous samples at each developmental stage (temporal protein analysis) and also to obtain membrane and cytosolic protein fractions (spatial protein analysis). A total of 345 proteins were quantified in two biological replicates. Comparative bioinformatics analyses revealed the switch from primary to secondary metabolism between the initial compartmentalized mycelium and the multinucleated hyphae. PMID:20224110

  1. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers.

    PubMed

    Campbell, Kieran R; Yau, Christopher

    2017-03-15

    Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

  2. Bioinformatical and in vitro approaches to essential oil-induced matrix metalloproteinase inhibition.

    PubMed

    Zeidán-Chuliá, Fares; Rybarczyk-Filho, José L; Gursoy, Mervi; Könönen, Eija; Uitto, Veli-Jukka; Gursoy, Orhan V; Cakmakci, Lutfu; Moreira, José C F; Gursoy, Ulvi K

    2012-06-01

    Essential oils carry diverse antimicrobial and anti-enzymatic properties. Matrix metalloproteinase (MMP) inhibition characteristics of Salvia fruticosa Miller (Labiatae), Myrtus communis Linnaeus (Myrtaceae), Juniperus communis Linnaeus (Cupressaceae), and Lavandula stoechas Linnaeus (Labiatae) essential oils were evaluated. Chemical compositions of the essential oils were analyzed by gas chromatography-mass spectrometry (GC-MS). Bioinformatical database analysis was performed by STRING 9.0 and STITCH 2.0 databases, and ViaComplex software. Antibacterial activity of essential oils against periodontopathogens was tested by the disc diffusion assay and the agar dilution method. Cellular proliferation and cytotoxicity were determined by commercial kits. MMP-2 and MMP-9 activities were measured by zymography. Bioinformatical database analyses, under a score of 0.4 (medium) and a prior correction of 0.0, gave rise to a model of protein (MMPs and tissue inhibitors of metalloproteinases) vs. chemical (essential oil components) interaction network; where MMPs and essential oil components interconnected through interaction with hydroxyl radicals, molecular oxygen, and hydrogen peroxide. Components from L. stoechas potentially displayed a higher grade of interaction with MMP-2 and -9. Although antibacterial and growth inhibitory effects of essential oils on the tested periodontopathogens were limited, all of them inhibited MMP-2 in vitro at concentrations of 1 and 5 µL/mL. Moreover, same concentrations of M. communis and L. stoechas also inhibited MMP-9. MMP-inhibiting concentrations of essential oils were not cytotoxic against keratinocytes. We propose essential oils of being useful therapeutic agents as MMP inhibitors through a mechanism possibly based on their antioxidant potential.

  3. Fungal Screening on Olive Oil for Extracellular Triacylglycerol Lipases: Selection of a Trichoderma harzianum Strain and Genome Wide Search for the Genes

    PubMed Central

    Canseco-Pérez, Miguel Angel; Castillo-Avila, Genny Margarita; Islas-Flores, Ignacio; Apolinar-Hernández, Max M.; Rivera-Muñoz, Gerardo; Gamboa-Angulo, Marcela; Couoh-Uicab, Yeny

    2018-01-01

    A lipolytic screening with fungal strains isolated from lignocellulosic waste collected in banana plantation dumps was carried out. A Trichoderma harzianum strain (B13-1) showed good extracellular lipolytic activity (205 UmL−1). Subsequently, functional screening of the lipolytic activity on Rhodamine B enriched with olive oil as the only carbon source was performed. The successful growth of the strain allows us to suggest that a true lipase is responsible for the lipolytic activity in the B13-1 strain. In order to identify the gene(s) encoding the protein responsible for the lipolytic activity, in silico identification and characterization of triacylglycerol lipases from T. harzianum is reported for the first time. A survey in the genome of this fungus retrieved 50 lipases; however, bioinformatic analyses and putative functional descriptions in different databases allowed us to choose seven lipases as candidates. Suitability of the bioinformatic screening to select the candidates was confirmed by reverse transcription polymerase chain reaction (RT-PCR). The gene codifying 526309 was expressed when the fungus grew in a medium with olive oil as carbon source. This protein shares homology with commercial lipases, making it a candidate for further applications. The success in identifying a lipase gene inducible with olive oil and the suitability of the functional screening and bioinformatic survey carried out herein, support the premise that the strategy can be used in other microorganisms with sequenced genomes to search for true lipases, or other enzymes belonging to large protein families. PMID:29370083

  4. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  5. A Portable Bioinformatics Course for Upper-Division Undergraduate Curriculum in Sciences

    ERIC Educational Resources Information Center

    Floraino, Wely B.

    2008-01-01

    This article discusses the challenges that bioinformatics education is facing and describes a bioinformatics course that is successfully taught at the California State Polytechnic University, Pomona, to the fourth year undergraduate students in biological sciences, chemistry, and computer science. Information on lecture and computer practice…

  6. Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…

  7. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  8. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  9. Green Fluorescent Protein-Focused Bioinformatics Laboratory Experiment Suitable for Undergraduates in Biochemistry Courses

    ERIC Educational Resources Information Center

    Rowe, Laura

    2017-01-01

    An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…

  10. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  11. A Summer Program Designed to Educate College Students for Careers in Bioinformatics

    ERIC Educational Resources Information Center

    Krilowicz, Beverly; Johnston, Wendie; Sharp, Sandra B.; Warter-Perez, Nancy; Momand, Jamil

    2007-01-01

    A summer program was created for undergraduates and graduate students that teaches bioinformatics concepts, offers skills in professional development, and provides research opportunities in academic and industrial institutions. We estimate that 34 of 38 graduates (89%) are in a career trajectory that will use bioinformatics. Evidence from…

  12. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  13. Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics

    ERIC Educational Resources Information Center

    Likic, Vladimir A.

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…

  14. Teaching Bioinformatics and Neuroinformatics by Using Free Web-Based Tools

    ERIC Educational Resources Information Center

    Grisham, William; Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with…

  15. When cloud computing meets bioinformatics: a review.

    PubMed

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  16. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  17. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  18. BIAS: Bioinformatics Integrated Application Software.

    PubMed

    Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z

    2005-04-15

    We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.

  19. Practical applications of the bioinformatics toolbox for narrowing quantitative trait loci.

    PubMed

    Burgess-Herbert, Sarah L; Cox, Allison; Tsaih, Shirng-Wern; Paigen, Beverly

    2008-12-01

    Dissecting the genes involved in complex traits can be confounded by multiple factors, including extensive epistatic interactions among genes, the involvement of epigenetic regulators, and the variable expressivity of traits. Although quantitative trait locus (QTL) analysis has been a powerful tool for localizing the chromosomal regions underlying complex traits, systematically identifying the causal genes remains challenging. Here, through its application to plasma levels of high-density lipoprotein cholesterol (HDL) in mice, we demonstrate a strategy for narrowing QTL that utilizes comparative genomics and bioinformatics techniques. We show how QTL detected in multiple crosses are subjected to both combined cross analysis and haplotype block analysis; how QTL from one species are mapped to the concordant regions in another species; and how genomewide scans associating haplotype groups with their phenotypes can be used to prioritize the narrowed regions. Then we illustrate how these individual methods for narrowing QTL can be systematically integrated for mouse chromosomes 12 and 15, resulting in a significantly reduced number of candidate genes, often from hundreds to <10. Finally, we give an example of how additional bioinformatics resources can be combined with experiments to determine the most likely quantitative trait genes.

  20. A comparison of common programming languages used in bioinformatics

    PubMed Central

    Fourment, Mathieu; Gillings, Michael R

    2008-01-01

    Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993

  1. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  2. Measuring the distance between multiple sequence alignments.

    PubMed

    Blackburne, Benjamin P; Whelan, Simon

    2012-02-15

    Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.

  3. Molecular evolution across the Asteraceae: micro- and macroevolutionary processes.

    PubMed

    Kane, Nolan C; Barker, Michael S; Zhan, Shing H; Rieseberg, Loren H

    2011-12-01

    The Asteraceae (Compositae) is a large family of over 20,000 wild, weedy, and domesticated species that comprise approximately 10% of all angiosperms, including annual and perennial herbs, shrubs and trees, and species on every continent except Antarctica. As a result, the Asteraceae provide a unique opportunity to understand the evolutionary genomics of lineage radiation and diversification at numerous phylogenetic scales. Using publicly available expressed sequence tags from 22 species representing four of the major Asteraceae lineages, we assessed neutral and nonneutral evolutionary processes across this diverse plant family. We used bioinformatic tools to identify candidate genes under selection in each species. Evolution at silent and coding sites were assessed for different Gene Ontology functional categories to compare rates of evolution over both short and long evolutionary timescales. Our results indicate that patterns of molecular change across the family are surprisingly consistent on a macroevolutionary timescale and much more so more than would be predicted from the analysis of one (or many) examples of microevolution. These analyses also point to particular classes of genes that may be crucial in shaping the radiation of this diverse plant family. Similar analyses of nuclear and chloroplast genes in six other plant families confirm that many of these patterns are common features of the plant kingdom.

  4. JEnsembl: a version-aware Java API to Ensembl data systems

    PubMed Central

    Paterson, Trevor; Law, Andy

    2012-01-01

    Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net). Contact: jensembl-develop@lists.sf.net, andy.law@roslin.ed.ac.uk, trevor.paterson@roslin.ed.ac.uk PMID:22945789

  5. Genome-wide identification of WRKY family genes in peach and analysis of WRKY expression during bud dormancy.

    PubMed

    Chen, Min; Tan, Qiuping; Sun, Mingyue; Li, Dongmei; Fu, Xiling; Chen, Xiude; Xiao, Wei; Li, Ling; Gao, Dongsheng

    2016-06-01

    Bud dormancy in deciduous fruit trees is an important adaptive mechanism for their survival in cold climates. The WRKY genes participate in several developmental and physiological processes, including dormancy. However, the dormancy mechanisms of WRKY genes have not been studied in detail. We conducted a genome-wide analysis and identified 58 WRKY genes in peach. These putative genes were located on all eight chromosomes. In bioinformatics analyses, we compared the sequences of WRKY genes from peach, rice, and Arabidopsis. In a cluster analysis, the gene sequences formed three groups, of which group II was further divided into five subgroups. Gene structure was highly conserved within each group, especially in groups IId and III. Gene expression analyses by qRT-PCR showed that WRKY genes showed different expression patterns in peach buds during dormancy. The mean expression levels of six WRKY genes (Prupe.6G286000, Prupe.1G393000, Prupe.1G114800, Prupe.1G071400, Prupe.2G185100, and Prupe.2G307400) increased during endodormancy and decreased during ecodormancy, indicating that these six WRKY genes may play a role in dormancy in a perennial fruit tree. This information will be useful for selecting fruit trees with desirable dormancy characteristics or for manipulating dormancy in genetic engineering programs.

  6. In silico comparative analysis of SSR markers in plants

    PubMed Central

    2011-01-01

    Background The adverse environmental conditions impose extreme limitation to growth and plant development, restricting the genetic potential and reflecting on plant yield losses. The progress obtained by classic plant breeding methods aiming at increasing abiotic stress tolerances have not been enough to cope with increasing food demands. New target genes need to be identified to reach this goal, which requires extensive studies of the related biological mechanisms. Comparative analyses in ancestral plant groups can help to elucidate yet unclear biological processes. Results In this study, we surveyed the occurrence patterns of expressed sequence tag-derived microsatellite markers for model plants. A total of 13,133 SSR markers were discovered using the SSRLocator software in non-redundant EST databases made for all eleven species chosen for this study. The dimer motifs are more frequent in lower plant species, such as green algae and mosses, and the trimer motifs are more frequent for the majority of higher plant groups, such as monocots and dicots. With this in silico study we confirm several microsatellite plant survey results made with available bioinformatics tools. Conclusions The comparative studies of EST-SSR markers among all plant lineages is well suited for plant evolution studies as well as for future studies of transferability of molecular markers. PMID:21247422

  7. Distinct Biological Potential of Streptococcus gordonii and Streptococcus sanguinis Revealed by Comparative Genome Analysis.

    PubMed

    Zheng, Wenning; Tan, Mui Fern; Old, Lesley A; Paterson, Ian C; Jakubovics, Nicholas S; Choo, Siew Woh

    2017-06-07

    Streptococcus gordonii and Streptococcus sanguinis are pioneer colonizers of dental plaque and important agents of bacterial infective endocarditis (IE). To gain a greater understanding of these two closely related species, we performed comparative analyses on 14 new S. gordonii and 5 S. sanguinis strains using various bioinformatics approaches. We revealed S. gordonii and S. sanguinis harbor open pan-genomes and share generally high sequence homology and number of core genes including virulence genes. However, we observed subtle differences in genomic islands and prophages between the species. Comparative pathogenomics analysis identified S. sanguinis strains have genes encoding IgA proteases, mitogenic factor deoxyribonucleases, nickel/cobalt uptake and cobalamin biosynthesis. On the contrary, genomic islands of S. gordonii strains contain additional copies of comCDE quorum-sensing system components involved in genetic competence. Two distinct polysaccharide locus architectures were identified, one of which was exclusively present in S. gordonii strains. The first evidence of genes encoding the CylA and CylB system by the α-haemolytic S. gordonii is presented. This study provides new insights into the genetic distinctions between S. gordonii and S. sanguinis, which yields understanding of tooth surfaces colonization and contributions to dental plaque formation, as well as their potential roles in the pathogenesis of IE.

  8. Structural insights into a key carotenogenesis related enzyme phytoene synthase of P. falciparum: a novel drug target for malaria.

    PubMed

    Agarwal, Shalini; Sharma, Vijeta; Phulera, Swastik; Abdin, M Z; Ayana, R; Singh, Shailja

    2015-12-01

    Carotenoids represent a diverse group of pigments derived from the common isoprenoid precursors and fulfill a variety of critical functions in plants and animals. Phytoene synthase (PSY), a transferase enzyme that catalyzes the first specific step in carotenoid biosynthesis plays a central role in the regulation of a number of essential functions mediated via carotenoids. PSYs have been deeply investigated in plants, bacteria and algae however in apicomplexans it is poorly studied. In an effort to characterize PSY in apicomplexans especially the malaria parasite Plasmodium falciparum (P. falciparum), a detailed bioinformatics analysis is undertaken. We have analysed the Phylogenetic relationship of PSY also referred to as octaprenyl pyrophosphate synthase (OPPS) in P. falciparum with other taxonomic groups. Further, we in silico characterized the secondary and tertiary structures of P. falciparum PSY/OPPS and compared the tertiary structures with crystal structure of Thermotoga maritima (T. maritima) OPPS. Our results evidenced the resemblance of P. falciparum PSY with the active site of T. maritima OPPS. Interestingly, the comparative structural analysis revealed an unconserved unique loop in P. falciparum OPPS/PSY. Such structural insights might contribute novel accessory functions to the protein thus, offering potential drug targets.

  9. Comparing GWAS Results of Complex Traits Using Full Genetic Model and Additive Models for Revealing Genetic Architecture

    PubMed Central

    Monir, Md. Mamun; Zhu, Jun

    2017-01-01

    Most of the genome-wide association studies (GWASs) for human complex diseases have ignored dominance, epistasis and ethnic interactions. We conducted comparative GWASs for total cholesterol using full model and additive models, which illustrate the impacts of the ignoring genetic variants on analysis results and demonstrate how genetic effects of multiple loci could differ across different ethnic groups. There were 15 quantitative trait loci with 13 individual loci and 3 pairs of epistasis loci identified by full model, whereas only 14 loci (9 common loci and 5 different loci) identified by multi-loci additive model. Again, 4 full model detected loci were not detected using multi-loci additive model. PLINK-analysis identified two loci and GCTA-analysis detected only one locus with genome-wide significance. Full model identified three previously reported genes as well as several new genes. Bioinformatics analysis showed some new genes are related with cholesterol related chemicals and/or diseases. Analyses of cholesterol data and simulation studies revealed that the full model performs were better than the additive-model performs in terms of detecting power and unbiased estimations of genetic variants of complex traits. PMID:28079101

  10. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    ERIC Educational Resources Information Center

    Wefer, Stephen H.; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics…

  11. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  12. Implementing a Web-Based Introductory Bioinformatics Course for Non-Bioinformaticians That Incorporates Practical Exercises

    ERIC Educational Resources Information Center

    Vincent, Antony T.; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M.; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J.; Lagüe, Patrick

    2018-01-01

    A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become…

  13. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  14. XML schemas for common bioinformatic data types and their application in workflow systems

    PubMed Central

    Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

    2006-01-01

    Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823

  15. Vignettes: diverse library staff offering diverse bioinformatics services*

    PubMed Central

    Osterbur, David L.; Alpi, Kristine; Canevari, Catharine; Corley, Pamela M.; Devare, Medha; Gaedeke, Nicola; Jacobs, Donna K.; Kirlew, Peter; Ohles, Janet A.; Vaughan, K.T.L.; Wang, Lili; Wu, Yongchun; Geer, Renata C.

    2006-01-01

    Objectives: The paper gives examples of the bioinformatics services provided in a variety of different libraries by librarians with a broad range of educational background and training. Methods: Two investigators sent an email inquiry to attendees of the “National Center for Biotechnology Information's (NCBI) Introduction to Molecular Biology Information Resources” or “NCBI Advanced Workshop for Bioinformatics Information Specialists (NAWBIS)” courses. The thirty-five-item questionnaire addressed areas such as educational background, library setting, types and numbers of users served, and bioinformatics training and support services provided. Answers were compiled into program vignettes. Discussion: The bioinformatics support services addressed in the paper are based in libraries with academic and clinical settings. Services have been established through different means: in collaboration with biology faculty as part of formal courses, through teaching workshops in the library, through one-on-one consultations, and by other methods. Librarians with backgrounds from art history to doctoral degrees in genetics have worked to establish these programs. Conclusion: Successful bioinformatics support programs can be established in libraries in a variety of different settings and by staff with a variety of different backgrounds and approaches. PMID:16888664

  16. Vertical and horizontal integration of bioinformatics education: A modular, interdisciplinary approach.

    PubMed

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D Blaine; Langeland, James A

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option. Furthermore, we believe that a true interdisciplinary science experience would be best served by introduction of bioinformatics modules within existing courses in biology and chemistry and other complementary departments. To that end, with support from the Howard Hughes Medical Institute, we have developed over a dozen independent bioinformatics modules for our students that are incorporated into courses ranging from general chemistry and biology, advanced specialty courses, and classes in complementary disciplines such as computer science, mathematics, and physics. These activities have largely promoted active learning in our classrooms and have enhanced student understanding of course materials. Herein, we describe our program, the activities we have developed, and assessment of our endeavors in this area. Copyright © 2009 International Union of Biochemistry and Molecular Biology, Inc.

  17. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  18. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

    PubMed

    Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy

    2007-01-01

    To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).

  19. Carving a niche: establishing bioinformatics collaborations

    PubMed Central

    Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.

    2006-01-01

    Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668

  20. Development of a cloud-based Bioinformatics Training Platform.

    PubMed

    Revote, Jerico; Watson-Haigh, Nathan S; Quenette, Steve; Bethwaite, Blair; McGrath, Annette; Shang, Catherine A

    2017-05-01

    The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. © The Author 2016. Published by Oxford University Press.

  1. Development of a cloud-based Bioinformatics Training Platform

    PubMed Central

    Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette

    2017-01-01

    Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333

  2. BioShaDock: a community driven bioinformatics shared Docker-based tools registry

    PubMed Central

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191

  3. BioShaDock: a community driven bioinformatics shared Docker-based tools registry.

    PubMed

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.

  4. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

    PubMed Central

    Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

    2016-01-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418

  5. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

    PubMed

    Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

    2016-09-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

  6. An ontology-based framework for bioinformatics workflows.

    PubMed

    Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

    2007-01-01

    The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.

  7. Endodontic Microbiology and Pathobiology: Current State of Knowledge.

    PubMed

    Fouad, Ashraf F

    2017-01-01

    Newer research tools and basic science knowledge base have allowed the exploration of endodontic diseases in the pulp and periapical tissues in novel ways. The use of next generation sequencing, bioinformatics analyses, genome-wide association studies, to name just a few of these innovations, has allowed the identification of hundreds of microorganisms and of host response factors. This review addresses recent advances in endodontic microbiology and the host response and discusses the potential for future innovations in this area. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Biosynthesis of the antimetabolite 6-thioguanine in Erwinia amylovora plays a key role in fire blight pathogenesis.

    PubMed

    Coyne, Sébastien; Chizzali, Cornelia; Khalil, Mohammed N A; Litomska, Agnieszka; Richter, Klaus; Beerhues, Ludger; Hertweck, Christian

    2013-09-27

    Sulfur for fire: The molecular basis for the biosynthesis of the antimetabolite 6-thioguanine (6TG) was unveiled in Erwinia amylovora, the causative agent of fire blight. Bioinformatics, heterologous pathway reconstitution in E. coli, and mutational analyses indicate that the protein YcfA mediates guanine thionation in analogy to 2-thiouridylase. Assays in planta and in cell cultures reveal for the first time a crucial role of 6TG in fire blight pathogenesis. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Methicillin-resistant Staphylococcus argenteus misidentified as methicillin-resistant Staphylococcus aureus emerging in western Sweden.

    PubMed

    Tång Hallbäck, Erika; Karami, Nahid; Adlerberth, Ingegerd; Cardew, Sofia; Ohlén, Maria; Engström Jakobsson, Hedvig; Svensson Stadler, Liselott

    2018-05-17

    Two strains included in a whole-genome sequencing project for methicillin-resistant Staphylococcus aureus (MRSA) were identified as non-Staphylococcus aureus when the sequences were analysed using the bioinformatics software ALEX (www.1928diagnostics.com, Gothenburg, Sweden). Sequencing of the sodA gene of these strains identified them as Staphylococcus argenteus. The collection of MRSA in western Sweden was checked for additional strains of this species. A total of 18 strains of S. argenteus isolated between 2011 and December 2017 were identified.

  10. A Trans-omics Mathematical Analysis Reveals Novel Functions of the Ornithine Metabolic Pathway in Cancer Stem Cells

    NASA Astrophysics Data System (ADS)

    Koseki, Jun; Matsui, Hidetoshi; Konno, Masamitsu; Nishida, Naohiro; Kawamoto, Koichi; Kano, Yoshihiro; Mori, Masaki; Doki, Yuichiro; Ishii, Hideshi

    2016-02-01

    Bioinformatics and computational modelling are expected to offer innovative approaches in human medical science. In the present study, we performed computational analyses and made predictions using transcriptome and metabolome datasets obtained from fluorescence-based visualisations of chemotherapy-resistant cancer stem cells (CSCs) in the human oesophagus. This approach revealed an uncharacterized role for the ornithine metabolic pathway in the survival of chemotherapy-resistant CSCs. The present study fastens this rationale for further characterisation that may lead to the discovery of innovative drugs against robust CSCs.

  11. The implementation of e-learning tools to enhance undergraduate bioinformatics teaching and learning: a case study in the National University of Singapore

    PubMed Central

    2009-01-01

    Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511

  12. Comparative Transcriptomic Analyses of Vegetable and Grain Pea (Pisum sativum L.) Seed Development

    PubMed Central

    Liu, Na; Zhang, Guwen; Xu, Shengchun; Mao, Weihua; Hu, Qizan; Gong, Yaming

    2015-01-01

    Understanding the molecular mechanisms regulating pea seed developmental process is extremely important for pea breeding. In this study, we used high-throughput RNA-Seq and bioinformatics analyses to examine the changes in gene expression during seed development in vegetable pea and grain pea, and compare the gene expression profiles of these two pea types. RNA-Seq generated 18.7 G of raw data, which were then de novo assembled into 77,273 unigenes with a mean length of 930 bp. Our results illustrate that transcriptional control during pea seed development is a highly coordinated process. There were 459 and 801 genes differentially expressed at early and late seed maturation stages between vegetable pea and grain pea, respectively. Soluble sugar and starch metabolism related genes were significantly activated during the development of pea seeds coinciding with the onset of accumulation of sugar and starch in the seeds. A comparative analysis of genes involved in sugar and starch biosynthesis in vegetable pea (high seed soluble sugar and low starch) and grain pea (high seed starch and low soluble sugar) revealed that differential expression of related genes at late development stages results in a negative correlation between soluble sugar and starch biosynthetic flux in vegetable and grain pea seeds. RNA-Seq data was validated by using real-time quantitative RT-PCR analysis for 30 randomly selected genes. To our knowledge, this work represents the first report of seed development transcriptomics in pea. The obtained results provide a foundation to support future efforts to unravel the underlying mechanisms that control the developmental biology of pea seeds, and serve as a valuable resource for improving pea breeding. PMID:26635856

  13. Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks

    PubMed Central

    Glicksberg, Benjamin S.; Li, Li; Badgeley, Marcus A.; Shameer, Khader; Kosoy, Roman; Beckmann, Noam D.; Pho, Nam; Hakenberg, Jörg; Ma, Meng; Ayers, Kristin L.; Hoffman, Gabriel E.; Dan Li, Shuyu; Schadt, Eric E.; Patel, Chirag J.; Chen, Rong; Dudley, Joel T.

    2016-01-01

    Motivation: Underrepresentation of racial groups represents an important challenge and major gap in phenomics research. Most of the current human phenomics research is based primarily on European populations; hence it is an important challenge to expand it to consider other population groups. One approach is to utilize data from EMR databases that contain patient data from diverse demographics and ancestries. The implications of this racial underrepresentation of data can be profound regarding effects on the healthcare delivery and actionability. To the best of our knowledge, our work is the first attempt to perform comparative, population-scale analyses of disease networks across three different populations, namely Caucasian (EA), African American (AA) and Hispanic/Latino (HL). Results: We compared susceptibility profiles and temporal connectivity patterns for 1988 diseases and 37 282 disease pairs represented in a clinical population of 1 025 573 patients. Accordingly, we revealed appreciable differences in disease susceptibility, temporal patterns, network structure and underlying disease connections between EA, AA and HL populations. We found 2158 significantly comorbid diseases for the EA cohort, 3265 for AA and 672 for HL. We further outlined key disease pair associations unique to each population as well as categorical enrichments of these pairs. Finally, we identified 51 key ‘hub’ diseases that are the focal points in the race-centric networks and of particular clinical importance. Incorporating race-specific disease comorbidity patterns will produce a more accurate and complete picture of the disease landscape overall and could support more precise understanding of disease relationships and patient management towards improved clinical outcomes. Contacts: rong.chen@mssm.edu or joel.dudley@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307606

  14. Comparative and evolutionary studies of vertebrate ALDH1A-like genes and proteins.

    PubMed

    Holmes, Roger S

    2015-06-05

    Vertebrate ALDH1A-like genes encode cytosolic enzymes capable of metabolizing all-trans-retinaldehyde to retinoic acid which is a molecular 'signal' guiding vertebrate development and adipogenesis. Bioinformatic analyses of vertebrate and invertebrate genomes were undertaken using known ALDH1A1, ALDH1A2 and ALDH1A3 amino acid sequences. Comparative analyses of the corresponding human genes provided evidence for distinct modes of gene regulation and expression with putative transcription factor binding sites (TFBS), CpG islands and micro-RNA binding sites identified for the human genes. ALDH1A-like sequences were identified for all mammalian, bird, lizard and frog genomes examined, whereas fish genomes displayed a more restricted distribution pattern for ALDH1A1 and ALDH1A3 genes. The ALDH1A1 gene was absent in many bony fish genomes examined, with the ALDH1A3 gene also absent in the medaka and tilapia genomes. Multiple ALDH1A1-like genes were identified in mouse, rat and marsupial genomes. Vertebrate ALDH1A1, ALDH1A2 and ALDH1A3 subunit sequences were highly conserved throughout vertebrate evolution. Comparative amino acid substitution rates showed that mammalian ALDH1A2 sequences were more highly conserved than for the ALDH1A1 and ALDH1A3 sequences. Phylogenetic studies supported an hypothesis for ALDH1A2 as a likely primordial gene originating in invertebrate genomes and undergoing sequential gene duplication to generate two additional genes, ALDH1A1 and ALDH1A3, in most vertebrate genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Integration of Bioinformatics into an Undergraduate Biology Curriculum and the Impact on Development of Mathematical Skills

    ERIC Educational Resources Information Center

    Wightman, Bruce; Hark, Amy T.

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this…

  16. Bioinformatics in Middle East Program Curricula--A Focus on the Arabian Gulf

    ERIC Educational Resources Information Center

    Loucif, Samia

    2014-01-01

    The purpose of this paper is to investigate the inclusion of bioinformatics in program curricula in the Middle East, focusing on educational institutions in the Arabian Gulf. Bioinformatics is a multidisciplinary field which has emerged in response to the need for efficient data storage and retrieval, and accurate and fast computational and…

  17. Making Bioinformatics Projects a Meaningful Experience in an Undergraduate Biotechnology or Biomedical Science Programme

    ERIC Educational Resources Information Center

    Sutcliffe, Iain C.; Cummings, Stephen P.

    2007-01-01

    Bioinformatics has emerged as an important discipline within the biological sciences that allows scientists to decipher and manage the vast quantities of data (such as genome sequences) that are now available. Consequently, there is an obvious need to provide graduates in biosciences with generic, transferable skills in bioinformatics. We present…

  18. Evaluating the Effectiveness of a Practical Inquiry-Based Learning Bioinformatics Module on Undergraduate Student Engagement and Applied Skills

    ERIC Educational Resources Information Center

    Brown, James A. L.

    2016-01-01

    A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion,…

  19. The S-Star Trial Bioinformatics Course: An On-line Learning Success

    ERIC Educational Resources Information Center

    Lim, Yun Ping; Hoog, Jan-Olov; Gardner, Phyllis; Ranganathan, Shoba; Andersson, Siv; Subbiah, Subramanian; Tan, Tin Wee; Hide, Winston; Weiss, Anthony S.

    2003-01-01

    The S-Star Trial Bioinformatics on-line course (www.s-star.org) is a global experiment in bioinformatics distance education. Six universities from five continents have participated in this project. One hundred and fifty students participated in the first trial course of which 96 followed through the entire course and 70 fulfilled the overall…

  20. Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis

    USDA-ARS?s Scientific Manuscript database

    Understanding the molecular and genetic mechanisms underlying variation in seed composition and contents among different genotypes is important for soybean oil quality improvement. We designed a bioinformatics approach to compare seed transcriptomes of 9 soybean genotypes varying in oil composition ...

Top