Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.
Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242
Agile parallel bioinformatics workflow management using Pwrake.
Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro
2011-09-08
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake
2011-01-01
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
2009-01-01
Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511
Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott
2011-06-01
Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.
BioMAJ: a flexible framework for databanks synchronization and processing.
Filangi, Olivier; Beausse, Yoann; Assi, Anthony; Legrand, Ludovic; Larré, Jean-Marc; Martin, Véronique; Collin, Olivier; Caron, Christophe; Leroy, Hugues; Allouche, David
2008-08-15
Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks. http://biomaj.genouest.org. BioMAJ is free open software. It is freely available under the CECILL version 2 license.
Bioinformatic pipelines in Python with Leaf
2013-01-01
Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315
Wright, Victoria Ann; Vaughan, Brendan W; Laurent, Thomas; Lopez, Rodrigo; Brooksbank, Cath; Schneider, Maria Victoria
2010-11-01
Today's molecular life scientists are well educated in the emerging experimental tools of their trade, but when it comes to training on the myriad of resources and tools for dealing with biological data, a less ideal situation emerges. Often bioinformatics users receive no formal training on how to make the most of the bioinformatics resources and tools available in the public domain. The European Bioinformatics Institute, which is part of the European Molecular Biology Laboratory (EMBL-EBI), holds the world's most comprehensive collection of molecular data, and training the research community to exploit this information is embedded in the EBI's mission. We have evaluated eLearning, in parallel with face-to-face courses, as a means of training users of our data resources and tools. We anticipate that eLearning will become an increasingly important vehicle for delivering training to our growing user base, so we have undertaken an extensive review of Learning Content Management Systems (LCMSs). Here, we describe the process that we used, which considered the requirements of trainees, trainers and systems administrators, as well as taking into account our organizational values and needs. This review describes the literature survey, user discussions and scripted platform testing that we performed to narrow down our choice of platform from 36 to a single platform. We hope that it will serve as guidance for others who are seeking to incorporate eLearning into their bioinformatics training programmes.
Ergatis: a web interface and scalable software system for bioinformatics workflows
Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.
2010-01-01
Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634
Integrative workflows for metagenomic analysis
Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.
2014-01-01
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562
BioShaDock: a community driven bioinformatics shared Docker-based tools registry
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191
BioShaDock: a community driven bioinformatics shared Docker-based tools registry.
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.
Evolving from bioinformatics in-the-small to bioinformatics in-the-large.
Parker, D Stott; Gorlick, Michael M; Lee, Christopher J
2003-01-01
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.
Interdisciplinary Introductory Course in Bioinformatics
ERIC Educational Resources Information Center
Kortsarts, Yana; Morris, Robert W.; Utell, Janine M.
2010-01-01
Bioinformatics is a relatively new interdisciplinary field that integrates computer science, mathematics, biology, and information technology to manage, analyze, and understand biological, biochemical and biophysical information. We present our experience in teaching an interdisciplinary course, Introduction to Bioinformatics, which was developed…
Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data
2004-12-01
genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis
Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud
2003-01-01
Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565
Scientific Workflow Management in Proteomics
de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus
2012-01-01
Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
XML-based approaches for the integration of heterogeneous bio-molecular data.
Mesiti, Marco; Jiménez-Ruiz, Ernesto; Sanz, Ismael; Berlanga-Llavori, Rafael; Perlasca, Paolo; Valentini, Giorgio; Manset, David
2009-10-15
The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien-Chi
2015-08-03
Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less
Researchers take on challenges and opportunities to mine "Big Data" for answers to complex biological questions. Learn how bioinformatics uses advanced computing, mathematics, and technological platforms to store, manage, analyze, and understand data.
ERIC Educational Resources Information Center
Sutcliffe, Iain C.; Cummings, Stephen P.
2007-01-01
Bioinformatics has emerged as an important discipline within the biological sciences that allows scientists to decipher and manage the vast quantities of data (such as genome sequences) that are now available. Consequently, there is an obvious need to provide graduates in biosciences with generic, transferable skills in bioinformatics. We present…
Moore, Jason H
2007-11-01
Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.
Biowep: a workflow enactment portal for bioinformatics applications.
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-03-08
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-01-01
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
OpenHelix: bioinformatics education outside of a different box.
Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C
2010-11-01
The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.
OpenHelix: bioinformatics education outside of a different box
Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.
2010-01-01
The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181
Bioinformatics for Exploration
NASA Technical Reports Server (NTRS)
Johnson, Kathy A.
2006-01-01
For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.
Application of machine learning methods in bioinformatics
NASA Astrophysics Data System (ADS)
Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen
2018-05-01
Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.
ERIC Educational Resources Information Center
Brown, James A. L.
2016-01-01
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion,…
'Isotopo' a database application for facile analysis and management of mass isotopomer data.
Ahmed, Zeeshan; Zeeshan, Saman; Huber, Claudia; Hensel, Michael; Schomburg, Dietmar; Münch, Richard; Eylert, Eva; Eisenreich, Wolfgang; Dandekar, Thomas
2014-01-01
The composition of stable-isotope labelled isotopologues/isotopomers in metabolic products can be measured by mass spectrometry and supports the analysis of pathways and fluxes. As a prerequisite, the original mass spectra have to be processed, managed and stored to rapidly calculate, analyse and compare isotopomer enrichments to study, for instance, bacterial metabolism in infection. For such applications, we provide here the database application 'Isotopo'. This software package includes (i) a database to store and process isotopomer data, (ii) a parser to upload and translate different data formats for such data and (iii) an improved application to process and convert signal intensities from mass spectra of (13)C-labelled metabolites such as tertbutyldimethylsilyl-derivatives of amino acids. Relative mass intensities and isotopomer distributions are calculated applying a partial least square method with iterative refinement for high precision data. The data output includes formats such as graphs for overall enrichments in amino acids. The package is user-friendly for easy and robust data management of multiple experiments. The 'Isotopo' software is available at the following web link (section Download): http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. The package contains three additional files: software executable setup (installer), one data set file (discussed in this article) and one excel file (which can be used to convert data from excel to '.iso' format). The 'Isotopo' software is compatible only with the Microsoft Windows operating system. http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. © The Author(s) 2014. Published by Oxford University Press.
An ontology-based framework for bioinformatics workflows.
Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer
2007-01-01
The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
Brown, James A L
2016-05-06
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick; Lo, Chien-Chi; Li, Po-E
EDGE bioinformatics was developed to help biologists process Next Generation Sequencing data (in the form of raw FASTQ files), even if they have little to no bioinformatics expertise. EDGE is a highly integrated and interactive web-based platform that is capable of running many of the standard analyses that biologists require for viral, bacterial/archaeal, and metagenomic samples. EDGE provides the following analytical workflows: quality trimming and host removal, assembly and annotation, comparisons against known references, taxonomy classification of reads and contigs, whole genome SNP-based phylogenetic analysis, and PCR analysis. EDGE provides an intuitive web-based interface for user input, allows users tomore » visualize and interact with selected results (e.g. JBrowse genome browser), and generates a final detailed PDF report. Results in the form of tables, text files, graphic files, and PDFs can be downloaded. A user management system allows tracking of an individual’s EDGE runs, along with the ability to share, post publicly, delete, or archive their results.« less
Hobbie, Kevin A; Peterson, Elena S; Barton, Michael L; Waters, Katrina M; Anderson, Kim A
2012-08-01
Large collaborative centers are a common model for accomplishing integrated environmental health research. These centers often include various types of scientific domains (e.g., chemistry, biology, bioinformatics) that are integrated to solve some of the nation's key economic or public health concerns. The Superfund Research Center (SRP) at Oregon State University (OSU) is one such center established in 2008 to study the emerging health risks of polycyclic aromatic hydrocarbons while using new technologies both in the field and laboratory. With outside collaboration at remote institutions, success for the center as a whole depends on the ability to effectively integrate data across all research projects and support cores. Therefore, the OSU SRP center developed a system that integrates environmental monitoring data with analytical chemistry data and downstream bioinformatics and statistics to enable complete "source-to-outcome" data modeling and information management. This article describes the development of this integrated information management system that includes commercial software for operational laboratory management and sample management in addition to open-source custom-built software for bioinformatics and experimental data management.
Hobbie, Kevin A.; Peterson, Elena S.; Barton, Michael L.; Waters, Katrina M.; Anderson, Kim A.
2012-01-01
Large collaborative centers are a common model for accomplishing integrated environmental health research. These centers often include various types of scientific domains (e.g. chemistry, biology, bioinformatics) that are integrated to solve some of the nation’s key economic or public health concerns. The Superfund Research Center (SRP) at Oregon State University (OSU) is one such center established in 2008 to study the emerging health risks of polycyclic aromatic hydrocarbons while utilizing new technologies both in the field and laboratory. With outside collaboration at remote institutions, success for the center as a whole depends on the ability to effectively integrate data across all research projects and support cores. Therefore, the OSU SRP center developed a system that integrates environmental monitoring data with analytical chemistry data and downstream bioinformatics and statistics to enable complete ‘source to outcome’ data modeling and information management. This article describes the development of this integrated information management system that includes commercial software for operational laboratory management and sample management in addition to open source custom built software for bioinformatics and experimental data management. PMID:22651935
Protein Bioinformatics Databases and Resources
Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.
2017-01-01
Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
NASA Astrophysics Data System (ADS)
Symeonidis, Iphigenia Sofia
This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.
Bioinformatics: A History of Evolution "In Silico"
ERIC Educational Resources Information Center
Ondrej, Vladan; Dvorak, Petr
2012-01-01
Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…
Online Bioinformatics Tutorials | Office of Cancer Genomics
Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.
MAPI: towards the integrated exploitation of bioinformatics Web Services.
Ramirez, Sergio; Karlsson, Johan; Trelles, Oswaldo
2011-10-27
Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community
D'Elia, Domenica; Gisel, Andreas; Eriksson, Nils-Einar; Kossida, Sophia; Mattila, Kimmo; Klucar, Lubos; Bongcam-Rudloff, Erik
2009-01-01
The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in. PMID:19534734
Virtual Bioinformatics Distance Learning Suite
ERIC Educational Resources Information Center
Tolvanen, Martti; Vihinen, Mauno
2004-01-01
Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…
[Integration of clinical and biological data in clinical practice using bioinformatics].
Coltell, Oscar; Arregui, María; Fabregat, Antonio; Portolés, Olga
2008-05-01
The aim of our work is to describe essential aspects of Medical Informatics, Bioinformatics and Biomedical Informatics, that are used in biomedical research and clinical practice. These disciplines have emerged from the need to find new scientific and technical approaches to manage, store, analyze and report data generated in clinical practice and molecular biology and other medical specialties. It can be also useful to integrate research information generated in different areas of health care. Moreover, these disciplines are interdisciplinary and integrative, two key features not shared by other areas of medical knowledge. Finally, when Bioinformatics and Biomedical Informatics approach to medical investigation and practice are applied, a new discipline, called Clinical Bioinformatics, emerges. The latter requires a specific training program to create a new professional profile. We have not been able to find a specific training program in Clinical Bioinformatics in Spain.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-01-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-06-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
de Brevern, Alexandre G; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain
2015-01-01
Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.
de Brevern, Alexandre G.; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain
2015-01-01
Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. PMID:26125026
Omics Metadata Management Software v. 1 (OMMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
2017-01-01
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769
Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
2017-01-01
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.
Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu
2015-01-01
The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy
2007-01-01
To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).
Carving a niche: establishing bioinformatics collaborations
Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.
2006-01-01
Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668
When cloud computing meets bioinformatics: a review.
Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong
2013-10-01
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
2012-01-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
Agents in bioinformatics, computational and systems biology.
Merelli, Emanuela; Armano, Giuliano; Cannata, Nicola; Corradini, Flavio; d'Inverno, Mark; Doms, Andreas; Lord, Phillip; Martin, Andrew; Milanesi, Luciano; Möller, Steffen; Schroeder, Michael; Luck, Michael
2007-01-01
The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.
Ramping up to the Biology Workbench: A Multi-Stage Approach to Bioinformatics Education
ERIC Educational Resources Information Center
Greene, Kathleen; Donovan, Sam
2005-01-01
In the process of designing and field-testing bioinformatics curriculum materials, we have adopted a three-stage, progressive model that emphasizes collaborative scientific inquiry. The elements of the model include: (1) context setting, (2) introduction to concepts, processes, and tools, and (3) development of competent use of technologically…
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D
2006-03-23
This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Computational biology and bioinformatics in Nigeria.
Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi
2014-04-01
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.
Computational Biology and Bioinformatics in Nigeria
Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi
2014-01-01
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310
Development of Bioinformatics Infrastructure for Genomics Research.
Mulder, Nicola J; Adebiyi, Ezekiel; Adebiyi, Marion; Adeyemi, Seun; Ahmed, Azza; Ahmed, Rehab; Akanle, Bola; Alibi, Mohamed; Armstrong, Don L; Aron, Shaun; Ashano, Efejiro; Baichoo, Shakuntala; Benkahla, Alia; Brown, David K; Chimusa, Emile R; Fadlelmola, Faisal M; Falola, Dare; Fatumo, Segun; Ghedira, Kais; Ghouila, Amel; Hazelhurst, Scott; Isewon, Itunuoluwa; Jung, Segun; Kassim, Samar Kamal; Kayondo, Jonathan K; Mbiyavanga, Mamana; Meintjes, Ayton; Mohammed, Somia; Mosaku, Abayomi; Moussa, Ahmed; Muhammd, Mustafa; Mungloo-Dilmohamud, Zahra; Nashiru, Oyekanmi; Odia, Trust; Okafor, Adaobi; Oladipo, Olaleye; Osamor, Victor; Oyelade, Jellili; Sadki, Khalid; Salifu, Samson Pandam; Soyemi, Jumoke; Panji, Sumir; Radouani, Fouzia; Souiai, Oussama; Tastan Bishop, Özlem
2017-06-01
Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa. Copyright © 2017 World Heart Federation (Geneva). Published by Elsevier B.V. All rights reserved.
Welcome to health information science and systems.
Zhang, Yanchun
2013-01-01
Health Information Science and Systems is an exciting, new, multidisciplinary journal that aims to use technologies in computer science to assist in disease diagnoses, treatment, prediction and monitoring through the modeling, design, development, visualization, integration and management of health related information. These computer-science technologies include such as information systems, web technologies, data mining, image processing, user interaction and interface, sensors and wireless networking and are applicable to a wide range of health related information including medical data, biomedical data, bioinformatics data, public health data.
BioContainers: an open-source and community-driven framework for software standardization.
da Veiga Leprevost, Felipe; Grüning, Björn A; Alves Aflitos, Saulo; Röst, Hannes L; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I; Perez-Riverol, Yasset
2017-08-15
BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). The software is freely available at github.com/BioContainers/. yperez@ebi.ac.uk. © The Author(s) 2017. Published by Oxford University Press.
BioContainers: an open-source and community-driven framework for software standardization
da Veiga Leprevost, Felipe; Grüning, Björn A.; Alves Aflitos, Saulo; Röst, Hannes L.; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C.; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I.; Perez-Riverol, Yasset
2017-01-01
Abstract Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/. Contact yperez@ebi.ac.uk PMID:28379341
Taking Bioinformatics to Systems Medicine.
van Kampen, Antoine H C; Moerland, Perry D
2016-01-01
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
MOWServ: a web client for integration of bioinformatic resources
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo
2010-01-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794
Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele
2014-01-01
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202
An architecture for genomics analysis in a clinical setting using Galaxy and Docker
Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A
2017-01-01
Abstract Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. PMID:29048555
An architecture for genomics analysis in a clinical setting using Galaxy and Docker.
Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A; Rance, B
2017-11-01
Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. © The Author 2017. Published by Oxford University Press.
MOWServ: a web client for integration of bioinformatic resources.
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo
2010-07-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
2011-01-01
Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.
Cieślik, Marcin; Mura, Cameron
2011-02-25
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.
Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert
2008-05-01
Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).
Ramharack, Pritika; Soliman, Mahmoud E S
2018-06-01
Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.
Beck, Susan L; Eaton, Linda H; Echeverria, Christina; Mooney, Kathi H
2017-10-01
SymptomCare@Home, an integrated symptom monitoring and management system, was designed as part of randomized clinical trials to help patients with cancer who receive chemotherapy in ambulatory clinics and often experience significant symptoms at home. An iterative design process was informed by chronic disease management theory and features of assessment and clinical decision support systems used in other diseases. Key stakeholders participated in the design process: nurse scientists, clinical experts, bioinformatics experts, and computer programmers. Especially important was input from end users, patients, and nurse practitioners participating in a series of studies testing the system. The system includes both a patient and clinician interface and fully integrates two electronic subsystems: a telephone computer-linked interactive voice response system and a Web-based Decision Support-Symptom Management System. Key features include (1) daily symptom monitoring, (2) self-management coaching, (3) alerting, and (4) nurse practitioner follow-up. The nurse practitioner is distinctively positioned to provide assessment, education, support, and pharmacologic and nonpharmacologic interventions to intensify management of poorly controlled symptoms at home. SymptomCare@Home is a model for providing telehealth. The system facilitates using evidence-based guidelines as part of a comprehensive symptom management approach. The design process and system features can be applied to other diseases and conditions.
Development of an undergraduate bioinformatics degree program at a liberal arts college.
Bagga, Paramjeet S
2012-09-01
The highly interdisciplinary field of bioinformatics has emerged as a powerful modern science. There has been a great demand for undergraduate- and graduate-level trained bioinformaticists in the industry as well in the academia. In order to address the needs for trained bioinformaticists, its curriculum must be offered at the undergraduate level, especially at four-year colleges, where a majority of the United States gets its education. There are many challenges in developing an undergraduate-level bioinformatics program that needs to be carefully designed as a well-integrated and cohesive interdisciplinary curriculum that prepares the students for a wide variety of career options. This article describes the challenges of establishing a highly interdisciplinary undergraduate major, the development of an undergraduate bioinformatics degree program at Ramapo College of New Jersey, and lessons learned in the last 10 years during its management.
Pitassi, Claudio; Gonçalves, Antonio Augusto; Moreno Júnior, Valter de Assis
2014-01-01
The scope of this article is to identify and analyze the factors that influence the adoption of ICT tools in experiments with bioinformatics at the Brazilian Cancer Institute (INCA). It involves a descriptive and exploratory qualitative field study. Evidence was collected mainly based on in-depth interviews with the management team at the Research Center and the IT Division. The answers were analyzed using the categorical content method. The categories were selected from the scientific literature and consolidated in the Technology-Organization-Environment (TOE) framework created for this study. The model proposed made it possible to demonstrate how the factors selected impacted INCA´s adoption of bioinformatics systems and tools, contributing to the investigation of two critical areas for the development of the health industry in Brazil, namely technological innovation and bioinformatics. Based on the evidence collected, a research question was posed: to what extent can the alignment of the factors related to the adoption of ICT tools in experiments with bioinformatics increase the innovation capacity of a Brazilian biopharmaceutical organization?
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.
Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R
2012-07-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
Deep learning in bioinformatics.
Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh
2017-09-01
In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad
Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologiesmore » in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.« less
Bioinformatics in translational drug discovery.
Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G
2017-08-31
Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).
BIRCH: a user-oriented, locally-customizable, bioinformatics system.
Fristensky, Brian
2007-02-09
Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.
BIRCH: A user-oriented, locally-customizable, bioinformatics system
Fristensky, Brian
2007-01-01
Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere. PMID:17291351
Secure Encapsulation and Publication of Biological Services in the Cloud Computing Environment
Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon
2013-01-01
Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved. PMID:24078906
Secure encapsulation and publication of biological services in the cloud computing environment.
Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon
2013-01-01
Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved.
Planning bioinformatics workflows using an expert system.
Chen, Xiaoling; Chang, Jeffrey T
2017-04-15
Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Planning bioinformatics workflows using an expert system
Chen, Xiaoling; Chang, Jeffrey T.
2017-01-01
Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928
Mulder, Nicola; Schwartz, Russell; Brazas, Michelle D; Brooksbank, Cath; Gaeta, Bruno; Morgan, Sarah L; Pauley, Mark A; Rosenwald, Anne; Rustici, Gabriella; Sierk, Michael; Warnow, Tandy; Welch, Lonnie
2018-02-01
Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.
Brooksbank, Cath; Morgan, Sarah L.; Rosenwald, Anne; Warnow, Tandy; Welch, Lonnie
2018-01-01
Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans. PMID:29390004
The GMOD Drupal bioinformatic server framework.
Papanicolaou, Alexie; Heckel, David G
2010-12-15
Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com.
Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele
2017-05-15
MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers versatility and efficiency. That is, it can handle collections of reads, with or without quality scores, as well as long genomic sequences while the existing routines concentrate mainly on NGS sequence data. Moreover, in the domain where a comparison is possible, the routines proposed here are faster than the available ones. In conclusion, FASTdoop is a much needed addition to Hadoop-BAM. The software and the datasets are available at http://www.di.unisa.it/FASTdoop/ . umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa
Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu
2016-01-01
The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985
GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training
Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.
2015-01-01
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076
Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M
2005-01-01
As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.
The GMOD Drupal Bioinformatic Server Framework
Papanicolaou, Alexie; Heckel, David G.
2010-01-01
Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988
Scalability and Validation of Big Data Bioinformatics Software.
Yang, Andrian; Troup, Michael; Ho, Joshua W K
2017-01-01
This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis
Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas
2016-01-01
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475
[Application of bioinformatics in researches of industrial biocatalysis].
Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao
2004-05-01
Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.
Cognitive-behavioral stress management reverses anxiety-related leukocyte transcriptional dynamics
Antoni, Michael H.; Lutgendorf, Susan K.; Blomberg, Bonnie; Carver, Charles S.; Lechner, Suzanne; Diaz, Alain; Stagl, Jamie; Arevalo, Jesusa M.G.; Cole, Steven W.
2011-01-01
Background Chronic threat and anxiety are associated with pro-inflammatory transcriptional profiles in circulating leukocytes, but the causal direction of that relationship has not been established. This study tested whether a Cognitive-Behavioral Stress Management (CBSM) intervention targeting negative affect and cognition might counteract anxiety-related transcriptional alterations in people confronting a major medical threat. Methods 199 women undergoing primary treatment of Stage 0–III breast cancer were randomized to a 10-week CBSM protocol or an active control condition. 79 provided peripheral blood leukocyte samples for genome-wide transcriptional profiling and bioinformatic analyses at baseline, 6-, and 12-month follow-ups. Results Baseline negative affect was associated with > 50% differential expression of 201 leukocyte transcripts, including up-regulated expression of pro-inflammatory and metastasis-related genes. CBSM altered leukocyte expression of 91 genes by > 50% at follow-up (Group × Time interaction), including down-regulation of pro-inflammatory and metastasis-related genes and up-regulation of Type I interferon response genes. Promoter-based bioinformatic analyses implicated decreased activity of NF-κB/Rel and GATA family transcription factors and increased activity of Interferon Response Factors and the Glucocorticoid Receptor (GR) as potential mediators of CBSM-induced transcriptional alterations. Conclusions In early stage breast cancer patients, a 10-week CBSM intervention can reverse anxiety-related up-regulation of pro-inflammatory gene expression in circulating leukocytes. These findings clarify the molecular signaling pathways by which behavioral interventions can influence physical health and alter peripheral inflammatory processes that may reciprocally affect brain affective and cognitive processes. PMID:22088795
caCORE: a common infrastructure for cancer informatics.
Covitz, Peter A; Hartel, Frank; Schaefer, Carl; De Coronado, Sherri; Fragoso, Gilberto; Sahni, Himanso; Gustafson, Scott; Buetow, Kenneth H
2003-12-12
Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).
Survey of MapReduce frame operation in bioinformatics.
Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke
2014-07-01
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Microbial bioinformatics for food safety and production
Alkema, Wynand; Boekhorst, Jos; Wels, Michiel
2016-01-01
In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. PMID:26082168
Identification of legionella effectors using bioinformatic approaches.
Segal, Gil
2013-01-01
Legionella pneumophila the causative agent of Legionnaires' disease, actively manipulates host cell processes to establish a replication niche inside host cells. The establishment of its replication niche requires a functional Icm/Dot type IV secretion system which translocates about 300 effector proteins into host cells during infection. Many of these effectors were first identified as effector candidates by several bioinformatic approaches, and these predicted effectors were later examined experimentally for translocation and a large number of which were validated as effector proteins. Here, I summarized the bioinformatic approaches that were used to identify these effectors.
Murray-Rust, Peter; Mitchell, John BO; Rzepa, Henry S
2005-01-01
Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols. PMID:15941476
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.
Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert
2017-05-15
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.
USDA-ARS?s Scientific Manuscript database
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficu...
Learning Genetics through an Authentic Research Simulation in Bioinformatics
ERIC Educational Resources Information Center
Gelbart, Hadas; Yarden, Anat
2006-01-01
Following the rationale that learning is an active process of knowledge construction as well as enculturation into a community of experts, we developed a novel web-based learning environment in bioinformatics for high-school biology majors in Israel. The learning environment enables the learners to actively participate in a guided inquiry process…
A case study of tuning MapReduce for efficient Bioinformatics in the cloud
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Lizhen; Wang, Zhong; Yu, Weikuan
The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less
SIDECACHE: Information access, management and dissemination framework for web services.
Doderer, Mark S; Burkhardt, Cory; Robbins, Kay A
2011-06-14
Many bioinformatics algorithms and data sets are deployed using web services so that the results can be explored via the Internet and easily integrated into other tools and services. These services often include data from other sites that is accessed either dynamically or through file downloads. Developers of these services face several problems because of the dynamic nature of the information from the upstream services. Many publicly available repositories of bioinformatics data frequently update their information. When such an update occurs, the developers of the downstream service may also need to update. For file downloads, this process is typically performed manually followed by web service restart. Requests for information obtained by dynamic access of upstream sources is sometimes subject to rate restrictions. SideCache provides a framework for deploying web services that integrate information extracted from other databases and from web sources that are periodically updated. This situation occurs frequently in biotechnology where new information is being continuously generated and the latest information is important. SideCache provides several types of services including proxy access and rate control, local caching, and automatic web service updating. We have used the SideCache framework to automate the deployment and updating of a number of bioinformatics web services and tools that extract information from remote primary sources such as NCBI, NCIBI, and Ensembl. The SideCache framework also has been used to share research results through the use of a SideCache derived web service.
Jadhav, Pravin R; Neal, Lauren; Florian, Jeff; Chen, Ying; Naeger, Lisa; Robertson, Sarah; Soon, Guoxing; Birnkrant, Debra
2010-09-01
This article presents a prototype for an operational innovation in knowledge management (KM). These operational innovations are geared toward managing knowledge efficiently and accessing all available information by embracing advances in bioinformatics and allied fields. The specific components of the proposed KM system are (1) a database to archive hepatitis C virus (HCV) treatment data in a structured format and retrieve information in a query-capable manner and (2) an automated analysis tool to inform trial design elements for HCV drug development. The proposed framework is intended to benefit drug development by increasing efficiency of dose selection and improving the consistency of advice from US Food and Drug Administration (FDA). It is also hoped that the framework will encourage collaboration among FDA, industry, and academic scientists to guide the HCV drug development process using model-based quantitative analysis techniques.
Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.
Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory
2016-06-13
Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.
Bioinformatics on the cloud computing platform Azure.
Shanahan, Hugh P; Owen, Anne M; Harrison, Andrew P
2014-01-01
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.
Bioinformatics on the Cloud Computing Platform Azure
Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.
2014-01-01
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811
Best practices in bioinformatics training for life scientists.
Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K
2013-09-01
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
Best practices in bioinformatics training for life scientists
Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.
2013-01-01
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301
Silicon Era of Carbon-Based Life: Application of Genomics and Bioinformatics in Crop Stress Research
Li, Man-Wah; Qi, Xinpeng; Ni, Meng; Lam, Hon-Ming
2013-01-01
Abiotic and biotic stresses lead to massive reprogramming of different life processes and are the major limiting factors hampering crop productivity. Omics-based research platforms allow for a holistic and comprehensive survey on crop stress responses and hence may bring forth better crop improvement strategies. Since high-throughput approaches generate considerable amounts of data, bioinformatics tools will play an essential role in storing, retrieving, sharing, processing, and analyzing them. Genomic and functional genomic studies in crops still lag far behind similar studies in humans and other animals. In this review, we summarize some useful genomics and bioinformatics resources available to crop scientists. In addition, we also discuss the major challenges and advancements in the “-omics” studies, with an emphasis on their possible impacts on crop stress research and crop improvement. PMID:23759993
Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.
2005-01-01
As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287
ERIC Educational Resources Information Center
Wefer, Stephen H.; Anderson, O. Roger
2008-01-01
Bioinformatics, merging biological data with computer science, is increasingly incorporated into school curricula at all levels. This case study of 10 secondary school students highlights student individual differences (especially the way they processed information and integrated procedural and analytical thought) and summarizes a variety of…
NASA Astrophysics Data System (ADS)
Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.
Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Tong, Weida; Harris, Stephen C; Fang, Hong; Shi, Leming; Perkins, Roger; Goodsaid, Federico; Frueh, Felix W
2007-01-01
Pharmacogenomics (PGx) is identified in the FDA Critical Path document as a major opportunity for advancing medical product development and personalized medicine. An integrated bioinformatics infrastructure for use in FDA data review is crucial to realize the benefits of PGx for public health. We have developed an integrated bioinformatics tool, called ArrayTrack, for managing, analyzing and interpreting genomic and other biomarker data (e.g. proteomic and metabolomic data). ArrayTrack is a highly flexible and robust software platform, which allows evolving with technological advances and changing user needs. ArrayTrack is used in the routine review of genomic data submitted to the FDA; here, three hypothetical examples of its use in the Voluntary eXploratory Data Submission (VXDS) program are illustrated.: © Published by Elsevier Ltd.
A Bioinformatics Facility for NASA
NASA Technical Reports Server (NTRS)
Schweighofer, Karl; Pohorille, Andrew
2006-01-01
Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.
NASA Astrophysics Data System (ADS)
Wefer, Stephen H.
The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were analyzed in relation to their experience in learning a bioinformatics mini-unit. There were distinct individual differences among the participants, especially in the way they processed information and integrated procedural and analytical thought during bioinformatics learning. These differences may provide insights into some of the specific needs of students that educators and curriculum designers should consider when designing bioinformatics learning experiences. Implications for teacher education and curriculum design are presented in addition to some suggestions for further research.
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS.
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%.
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%. PMID:26504488
78 FR 35936 - Statement of Organization, Functions, and Delegations of Authority
Federal Register 2010, 2011, 2012, 2013, 2014
2013-06-14
... to, laboratory information systems, quality management systems and bioinformatics; (3) ensures a safe working environment in NCIRD laboratories; and (4) collaborates effectively with other centers and offices...
Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan
2007-01-01
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho
2017-11-01
High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.
Jones, Bethan M; Edwards, Richard J; Skipp, Paul J; O'Connor, C David; Iglesias-Rodriguez, M Debora
2011-06-01
Emiliania huxleyi is a unicellular marine phytoplankton species known to play a significant role in global biogeochemistry. Through the dual roles of photosynthesis and production of calcium carbonate (calcification), carbon is transferred from the atmosphere to ocean sediments. Almost nothing is known about the molecular mechanisms that control calcification, a process that is tightly regulated within the cell. To initiate proteomic studies on this important and phylogenetically remote organism, we have devised efficient protein extraction protocols and developed a bioinformatics pipeline that allows the statistically robust assignment of proteins from MS/MS data using preexisting EST sequences. The bioinformatics tool, termed BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics using ESTs), is fully automated and was used to search against data generated from three strains. BUDAPEST increased the number of identifications over standard protein database searches from 37 to 99 proteins when data were amalgamated. Proteins involved in diverse cellular processes were uncovered. For example, experimental evidence was obtained for a novel type I polyketide synthase and for various photosystem components. The proteomic and bioinformatic approaches developed in this study are of wider applicability, particularly to the oceanographic community where genomic sequence data for species of interest are currently scarce.
BigDataScript: a scripting language for data pipelines.
Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu
2015-01-01
The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. © The Author 2014. Published by Oxford University Press.
BigDataScript: a scripting language for data pipelines
Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu
2015-01-01
Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com PMID:25189778
Thiele, H.; Glandorf, J.; Koerting, G.; Reidegeld, K.; Blüggel, M.; Meyer, H.; Stephan, C.
2007-01-01
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE. ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list. The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format. The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
Menegidio, Fabiano B; Jabes, Daniela L; Costa de Oliveira, Regina; Nunes, Luiz R
2018-02-01
This manuscript introduces and describes Dugong, a Docker image based on Ubuntu 16.04, which automates installation of more than 3500 bioinformatics tools (along with their respective libraries and dependencies), in alternative computational environments. The software operates through a user-friendly XFCE4 graphic interface that allows software management and installation by users not fully familiarized with the Linux command line and provides the Jupyter Notebook to assist in the delivery and exchange of consistent and reproducible protocols and results across laboratories, assisting in the development of open science projects. Source code and instructions for local installation are available at https://github.com/DugongBioinformatics, under the MIT open source license. Luiz.nunes@ufabc.edu.br. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Molecular dynamics simulations through GPU video games technologies
Loukatou, Styliani; Papageorgiou, Louis; Fakourelis, Paraskevas; Filntisi, Arianna; Polychronidou, Eleftheria; Bassis, Ioannis; Megalooikonomou, Vasileios; Makałowski, Wojciech; Vlachakis, Dimitrios; Kossida, Sophia
2016-01-01
Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements of atoms and molecules is molecular dynamics (MD). MD is an in silico method to simulate the physical motions of atoms and molecules under certain conditions. This has become a state strategic technique and now plays a key role in many areas of exact sciences, such as chemistry, biology, physics and medicine. Due to their complexity, MD calculations could require enormous amounts of computer memory and time and therefore their execution has been a big problem. Despite the huge computational cost, molecular dynamics have been implemented using traditional computers with a central memory unit (CPU). A graphics processing unit (GPU) computing technology was first designed with the goal to improve video games, by rapidly creating and displaying images in a frame buffer such as screens. The hybrid GPU-CPU implementation, combined with parallel computing is a novel technology to perform a wide range of calculations. GPUs have been proposed and used to accelerate many scientific computations including MD simulations. Herein, we describe the new methodologies developed initially as video games and how they are now applied in MD simulations. PMID:27525251
The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.
Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi
2005-04-15
Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.
Text mining and medicine: usefulness in respiratory diseases.
Piedra, David; Ferrer, Antoni; Gea, Joaquim
2014-03-01
It is increasingly common to have medical information in electronic format. This includes scientific articles as well as clinical management reviews, and even records from health institutions with patient data. However, traditional instruments, both individual and institutional, are of little use for selecting the most appropriate information in each case, either in the clinical or research field. So-called text or data «mining» enables this huge amount of information to be managed, extracting it from various sources using processing systems (filtration and curation), integrating it and permitting the generation of new knowledge. This review aims to provide an overview of text and data mining, and of the potential usefulness of this bioinformatic technique in the exercise of care in respiratory medicine and in research in the same field. Copyright © 2013 SEPAR. Published by Elsevier Espana. All rights reserved.
X-ray crystallography over the past decade for novel drug discovery - where are we heading next?
Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek
2015-01-01
Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible.
EST-PAC a web package for EST annotation and protein sequence prediction
Strahm, Yvan; Powell, David; Lefèvre, Christophe
2006-01-01
With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.
Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A
2005-04-07
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
High-throughput bioinformatics with the Cyrille2 pipeline system
Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ
2008-01-01
Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742
Challenges of Identifying Clinically Actionable Genetic Variants for Precision Medicine
2016-01-01
Advances in genomic medicine have the potential to change the way we treat human disease, but translating these advances into reality for improving healthcare outcomes depends essentially on our ability to discover disease- and/or drug-associated clinically actionable genetic mutations. Integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a big data infrastructure can provide an efficient and effective way to identify clinically actionable genetic variants for personalized treatments and reduce healthcare costs. We review bioinformatics processing of next-generation sequencing (NGS) data, bioinformatics infrastructures for implementing precision medicine, and bioinformatics approaches for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:27195526
BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.
Velloso, Henrique; Vialle, Ricardo A; Ortega, J Miguel
2015-06-02
Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.
Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT
Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.
2017-01-01
Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.
Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W
2018-01-01
Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
An overview of topic modeling and its current applications in bioinformatics.
Liu, Lin; Tang, Lin; Dong, Wen; Yao, Shaowen; Zhou, Wei
2016-01-01
With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics. This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications. Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.
Parallel computing in genomic research: advances and applications
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801
Pérès, Sabine; Felicori, Liza; Rialle, Stéphanie; Jobard, Elodie; Molina, Franck
2010-01-01
Motivation: In the available databases, biological processes are described from molecular and cellular points of view, but these descriptions are represented with text annotations that make it difficult to handle them for computation. Consequently, there is an obvious need for formal descriptions of biological processes. Results: We present a formalism that uses the BioΨ concepts to model biological processes from molecular details to networks. This computational approach, based on elementary bricks of actions, allows us to calculate on biological functions (e.g. process comparison, mapping structure–function relationships, etc.). We illustrate its application with two examples: the functional comparison of proteases and the functional description of the glycolysis network. This computational approach is compatible with detailed biological knowledge and can be applied to different kinds of systems of simulation. Availability: www.sysdiag.cnrs.fr/publications/supplementary-materials/BioPsi_Manager/ Contact: sabine.peres@sysdiag.cnrs.fr; franck.molina@sysdiag.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20448138
Parallel computing in genomic research: advances and applications.
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Mathematics and evolutionary biology make bioinformatics education comprehensible.
Jungck, John R; Weisstein, Anton E
2013-09-01
The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.
González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E
2011-01-01
After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.
Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E
2012-03-19
A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community
2012-01-01
Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538
Mining semantic networks of bioinformatics e-resources from the literature
2011-01-01
Background There have been a number of recent efforts (e.g. BioCatalogue, BioMoby) to systematically catalogue bioinformatics tools, services and datasets. These efforts rely on manual curation, making it difficult to cope with the huge influx of various electronic resources that have been provided by the bioinformatics community. We present a text mining approach that utilises the literature to automatically extract descriptions and semantically profile bioinformatics resources to make them available for resource discovery and exploration through semantic networks that contain related resources. Results The method identifies the mentions of resources in the literature and assigns a set of co-occurring terminological entities (descriptors) to represent them. We have processed 2,691 full-text bioinformatics articles and extracted profiles of 12,452 resources containing associated descriptors with binary and tf*idf weights. Since such representations are typically sparse (on average 13.77 features per resource), we used lexical kernel metrics to identify semantically related resources via descriptor smoothing. Resources are then clustered or linked into semantic networks, providing the users (bioinformaticians, curators and service/tool crawlers) with a possibility to explore algorithms, tools, services and datasets based on their relatedness. Manual exploration of links between a set of 18 well-known bioinformatics resources suggests that the method was able to identify and group semantically related entities. Conclusions The results have shown that the method can reconstruct interesting functional links between resources (e.g. linking data types and algorithms), in particular when tf*idf-like weights are used for profiling. This demonstrates the potential of combining literature mining and simple lexical kernel methods to model relatedness between resource descriptors in particular when there are few features, thus potentially improving the resource description, discovery and exploration process. The resource profiles are available at http://gnode1.mib.man.ac.uk/bioinf/semnets.html PMID:21388573
Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard
2008-04-25
With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way.
Cellular automata and its applications in protein bioinformatics.
Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen
2011-09-01
With the explosion of protein sequences generated in the postgenomic era, it is highly desirable to develop high-throughput tools for rapidly and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery. Many bioinformatics tools have been developed by means of machine learning methods. This review is focused on the applications of a new kind of science (cellular automata) in protein bioinformatics. A cellular automaton (CA) is an open, flexible and discrete dynamic model that holds enormous potentials in modeling complex systems, in spite of the simplicity of the model itself. Researchers, scientists and practitioners from different fields have utilized cellular automata for visualizing protein sequences, investigating their evolution processes, and predicting their various attributes. Owing to its impressive power, intuitiveness and relative simplicity, the CA approach has great potential for use as a tool for bioinformatics.
Genomics and breeding in food crops
USDA-ARS?s Scientific Manuscript database
Plant biology is in the midst of a revolution. The generation of tremendous volumes of sequence information introduce new technical challenges into plant biology and agriculture. The relatively new field of bioinformatics addresses these challenges by utilizing efficient data management strategies;...
Controlling for confounding variables in MS-omics protocol: why modularity matters.
Smith, Rob; Ventura, Dan; Prince, John T
2014-09-01
As the field of bioinformatics research continues to grow, more and more novel techniques are proposed to meet new challenges and improvements upon solutions to long-standing problems. These include data processing techniques and wet lab protocol techniques. Although the literature is consistently thorough in experimental detail and variable-controlling rigor for wet lab protocol techniques, bioinformatics techniques tend to be less described and less controlled. As the validation or rejection of hypotheses rests on the experiment's ability to isolate and measure a variable of interest, we urge the importance of reducing confounding variables in bioinformatics techniques during mass spectrometry experimentation. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
Bioinformatics in protein kinases regulatory network and drug discovery.
Chen, Qingfeng; Luo, Haiqiong; Zhang, Chengqi; Chen, Yi-Ping Phoebe
2015-04-01
Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry. Copyright © 2015 Elsevier Inc. All rights reserved.
The Web as an educational tool for/in learning/teaching bioinformatics statistics.
Oliver, J; Pisano, M E; Alonso, T; Roca, P
2005-12-01
Statistics provides essential tool in Bioinformatics to interpret the results of a database search or for the management of enormous amounts of information provided from genomics, proteomics and metabolomics. The goal of this project was the development of a software tool that would be as simple as possible to demonstrate the use of the Bioinformatics statistics. Computer Simulation Methods (CSMs) developed using Microsoft Excel were chosen for their broad range of applications, immediate and easy formula calculation, immediate testing and easy graphics representation, and of general use and acceptance by the scientific community. The result of these endeavours is a set of utilities which can be accessed from the following URL: http://gmein.uib.es/bioinformatica/statistics. When tested on students with previous coursework with traditional statistical teaching methods, the general opinion/overall consensus was that Web-based instruction had numerous advantages, but traditional methods with manual calculations were also needed for their theory and practice. Once having mastered the basic statistical formulas, Excel spreadsheets and graphics were shown to be very useful for trying many parameters in a rapid fashion without having to perform tedious calculations. CSMs will be of great importance for the formation of the students and professionals in the field of bioinformatics, and for upcoming applications of self-learning and continuous formation.
Data mining in bioinformatics using Weka.
Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H
2004-10-12
The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.
X-ray crystallography over the past decade for novel drug discovery – where are we heading next?
Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek
2015-01-01
Introduction Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. Areas covered This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. Expert opinion X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible. PMID:26177814
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Bioinformatic approaches to augment study of epithelial-to-mesenchymal transition in lung cancer
Beck, Tim N.; Chikwem, Adaeze J.; Solanki, Nehal R.
2014-01-01
Bioinformatic approaches are intended to provide systems level insight into the complex biological processes that underlie serious diseases such as cancer. In this review we describe current bioinformatic resources, and illustrate how they have been used to study a clinically important example: epithelial-to-mesenchymal transition (EMT) in lung cancer. Lung cancer is the leading cause of cancer-related deaths and is often diagnosed at advanced stages, leading to limited therapeutic success. While EMT is essential during development and wound healing, pathological reactivation of this program by cancer cells contributes to metastasis and drug resistance, both major causes of death from lung cancer. Challenges of studying EMT include its transient nature, its molecular and phenotypic heterogeneity, and the complicated networks of rewired signaling cascades. Given the biology of lung cancer and the role of EMT, it is critical to better align the two in order to advance the impact of precision oncology. This task relies heavily on the application of bioinformatic resources. Besides summarizing recent work in this area, we use four EMT-associated genes, TGF-β (TGFB1), NEDD9/HEF1, β-catenin (CTNNB1) and E-cadherin (CDH1), as exemplars to demonstrate the current capacities and limitations of probing bioinformatic resources to inform hypothesis-driven studies with therapeutic goals. PMID:25096367
Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard
2008-01-01
Background With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Methods Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. Conclusions As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way. PMID:18460173
Tao, Yuan; Liu, Juan
2005-01-01
The Internet has already deflated our world of working and living into a very small scope, thus bringing out the concept of Earth Village, in which people could communicate and co-work though thousands' miles far away from each other. This paper describes a prototype, which is just like an Earth Lab for bioinformatics, based on Web services framework to build up a network architecture for bioinformatics research and for world wide biologists to easily implement enormous, complex processes, and effectively share and access computing resources and data, regardless of how heterogeneous the format of the data is and how decentralized and distributed these resources are around the world. A diminutive and simplified example scenario is given out to realize the prototype after that.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis
Rampp, Markus; Soddemann, Thomas; Lederer, Hermann
2006-01-01
We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
S3DB core: a framework for RDF generation and management in bioinformatics infrastructures
2010-01-01
Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315
In the loop: promoter–enhancer interactions and bioinformatics
Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke
2016-01-01
Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731
p3d--Python module for structural bioinformatics.
Fufezan, Christian; Specht, Michael
2009-08-21
High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.
Eckart, J Dana; Sobral, Bruno W S
2003-01-01
The emergent needs of the bioinformatics community challenge current information systems. The pace of biological data generation far outstrips Moore's Law. Therefore, a gap continues to widen between the capabilities to produce biological (molecular and cell) data sets and the capability to manage and analyze these data sets. As a result, Federal investments in large data set generation produces diminishing returns in terms of the community's capabilities of understanding biology and leveraging that understanding to make scientific and technological advances that improve society. We are building an open framework to address various data management issues including data and tool interoperability, nomenclature and data communication standardization, and database integration. PathPort, short for Pathogen Portal, employs a generic, web-services based framework to deal with some of the problems identified by the bioinformatics community. The motivating research goal of a scalable system to provide data management and analysis for key pathosystems, especially relating to molecular data, has resulted in a generic framework using two major components. On the server-side, we employ web-services. On the client-side, a Java application called ToolBus acts as a client-side "bus" for contacting data and tools and viewing results through a single, consistent user interface.
producing the highest quality data. We are organized into 6 functional units: the laboratory, bioinformatics and IT, project management, technology evaluation, statistical genetics, and administration. Our staff is made up of talented and dedicated people who uphold the best standards to ensure excellent quality
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis
2014-11-15
The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bioinformatics: indispensable, yet hidden in plain sight?
Bartlett, Andrew; Penders, Bart; Lewis, Jamie
2017-06-21
Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.
NMRPro: an integrated web component for interactive processing and visualization of NMR spectra.
Mohamed, Ahmed; Nguyen, Canh Hao; Mamitsuka, Hiroshi
2016-07-01
The popularity of using NMR spectroscopy in metabolomics and natural products has driven the development of an array of NMR spectral analysis tools and databases. Particularly, web applications are well used recently because they are platform-independent and easy to extend through reusable web components. Currently available web applications provide the analysis of NMR spectra. However, they still lack the necessary processing and interactive visualization functionalities. To overcome these limitations, we present NMRPro, a web component that can be easily incorporated into current web applications, enabling easy-to-use online interactive processing and visualization. NMRPro integrates server-side processing with client-side interactive visualization through three parts: a python package to efficiently process large NMR datasets on the server-side, a Django App managing server-client interaction, and SpecdrawJS for client-side interactive visualization. Demo and installation instructions are available at http://mamitsukalab.org/tools/nmrpro/ mohamed@kuicr.kyoto-u.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The growing need for microservices in bioinformatics.
Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J
2016-01-01
Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting.
The growing need for microservices in bioinformatics
Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.
2016-01-01
Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting. PMID:27994937
The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics.
de Matos, Paula; Cham, Jennifer A; Cao, Hong; Alcántara, Rafael; Rowland, Francis; Lopez, Rodrigo; Steinbeck, Christoph
2013-03-20
User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users' requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios.For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature.We employed several UCD techniques, including: persona development, interviews, 'canvas sort' card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience.
A review of estimation of distribution algorithms in bioinformatics
Armañanzas, Rubén; Inza, Iñaki; Santana, Roberto; Saeys, Yvan; Flores, Jose Luis; Lozano, Jose Antonio; Peer, Yves Van de; Blanco, Rosa; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro
2008-01-01
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain. PMID:18822112
Bioinformatics Education—Perspectives and Challenges out of Africa
Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.
2015-01-01
The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas
2014-01-01
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
Tools and collaborative environments for bioinformatics research
Giugno, Rosalba; Pulvirenti, Alfredo
2011-01-01
Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies. PMID:21984743
A Survey of Bioinformatics Database and Software Usage through Mining the Literature.
Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert
2016-01-01
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
Ramakumar, Adarsh; Subramanian, Uma; Prasanna, Pataje G S
2015-11-01
High-throughput individual diagnostic dose assessment is essential for medical management of radiation-exposed subjects after a mass casualty. Cytogenetic assays such as the Dicentric Chromosome Assay (DCA) are recognized as the gold standard by international regulatory authorities. DCA is a multi-step and multi-day bioassay. DCA, as described in the IAEA manual, can be used to assess dose up to 4-6 weeks post-exposure quite accurately but throughput is still a major issue and automation is very essential. The throughput is limited, both in terms of sample preparation as well as analysis of chromosome aberrations. Thus, there is a need to design and develop novel solutions that could utilize extensive laboratory automation for sample preparation, and bioinformatics approaches for chromosome-aberration analysis to overcome throughput issues. We have transitioned the bench-based cytogenetic DCA to a coherent process performing high-throughput automated biodosimetry for individual dose assessment ensuring quality control (QC) and quality assurance (QA) aspects in accordance with international harmonized protocols. A Laboratory Information Management System (LIMS) is designed, implemented and adapted to manage increased sample processing capacity, develop and maintain standard operating procedures (SOP) for robotic instruments, avoid data transcription errors during processing, and automate analysis of chromosome-aberrations using an image analysis platform. Our efforts described in this paper intend to bridge the current technological gaps and enhance the potential application of DCA for a dose-based stratification of subjects following a mass casualty. This paper describes one such potential integrated automated laboratory system and functional evolution of the classical DCA towards increasing critically needed throughput. Published by Elsevier B.V.
Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
TRIENNIAL LACTATION SYMPOSIUM: Nutrigenomics in livestock: Systems biology meets nutrition.
Loor, J J; Vailati-Riboni, M; McCann, J C; Zhou, Z; Bionaz, M
2015-12-01
The advent of high-throughput technologies to study an animal's genome, proteome, and metabolome (i.e., "omics" tools) constituted a setback to the use of reductionism in livestock research. More recent development of "next-generation sequencing" tools was instrumental in allowing in-depth studies of the microbiome in the rumen and other sections of the gastrointestinal tract. Omics, along with bioinformatics, constitutes the foundation of modern systems biology, a field of study widely used in model organisms (e.g., rodents, yeast, humans) to enhance understanding of the complex biological interactions occurring within cells and tissues at the gene, protein, and metabolite level. Application of systems biology concepts is ideal for the study of interactions between nutrition and physiological state with tissue and cell metabolism and function during key life stages of livestock species, including the transition from pregnancy to lactation, in utero development, or postnatal growth. Modern bioinformatic tools capable of discerning functional outcomes and biologically meaningful networks complement the ever-increasing ability to generate large molecular, microbial, and metabolite data sets. Simultaneous visualization of the complex intertissue adaptations to physiological state and nutrition can now be discerned. Studies to understand the linkages between the microbiome and the absorptive epithelium using the integrative approach are emerging. We present examples of new knowledge generated through the application of functional analyses of transcriptomic, proteomic, and metabolomic data sets encompassing nutritional management of dairy cows, pigs, and poultry. Published work to date underscores that the integrative approach across and within tissues may prove useful for fine-tuning nutritional management of livestock. An important goal during this process is to uncover key molecular players involved in the organismal adaptations to nutrition.
Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich
2017-04-16
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.
Navigating the changing learning landscape: perspective from bioinformatics.ca
Ouellette, B. F. Francis
2013-01-01
With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468
Towards a career in bioinformatics
2009-01-01
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508
Towards a career in bioinformatics.
Ranganathan, Shoba
2009-12-03
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.
Omics Metadata Management Software (OMMS).
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
Expanding the horizons of microRNA bioinformatics.
Huntley, Rachael P; Kramarz, Barbara; Sawford, Tony; Umrao, Zara; Kalea, Anastasia Z; Acquaah, Vanessa; Martin, Maria-Jesus; Mayr, Manuel; Lovering, Ruth C
2018-06-05
MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyse microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics datasets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 3,900 Gene Ontology annotations associated with almost 500 miRNAs from human, mouse and rat and over 2,200 experimentally validated miRNA:target interactions. We illustrate how this resource can be used to create miRNA-focused interaction networks with a biological context using the known biological role of miRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent datasets for reproducible functional analysis of microRNAs across all biological research areas. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Better bioinformatics through usability analysis.
Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia
2009-02-01
Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.
Bruder, Katherine; Malki, Kema; Cooper, Alexandria; Sible, Emily; Shapiro, Jason W.; Watkins, Siobhan C.; Putonti, Catherine
2016-01-01
Advances in bioinformatics and sequencing technologies have allowed for the analysis of complex microbial communities at an unprecedented rate. While much focus is often placed on the cellular members of these communities, viruses play a pivotal role, particularly bacteria-infecting viruses (bacteriophages); phages mediate global biogeochemical processes and drive microbial evolution through bacterial grazing and horizontal gene transfer. Despite their importance and ubiquity in nature, very little is known about the diversity and structure of viral communities. Though the need for culture-based methods for viral identification has been somewhat circumvented through metagenomic techniques, the analysis of metaviromic data is marred with many unique issues. In this review, we examine the current bioinformatic approaches for metavirome analyses and the inherent challenges facing the field as illustrated by the ongoing efforts in the exploration of freshwater phage populations. PMID:27375355
A review of bioinformatic methods for forensic DNA analyses.
Liu, Yao-Yuan; Harbison, SallyAnn
2018-03-01
Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.
Jimenez, Rafael C; Corpas, Manuel
2013-01-01
Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
Discovery of 100K SNP array and its utilization in sugarcane
USDA-ARS?s Scientific Manuscript database
Next generation sequencing (NGS) enable us to identify thousands of single nucleotide polymorphisms (SNPs) marker for genotyping and fingerprinting. However, the process requires very precise bioinformatics analysis and filtering process. High throughput SNP array with predefined genomic location co...
The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics
2013-01-01
User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users’ requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios. For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature. We employed several UCD techniques, including: persona development, interviews, ‘canvas sort’ card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience. PMID:23514033
... Issue All Issues Explore Findings by Topic Cell Biology Cellular Structures, Functions, Processes, Imaging, Stress Response Chemistry ... Glycobiology, Synthesis, Natural Products, Chemical Reactions Computers in Biology Bioinformatics, Modeling, Systems Biology, Data Visualization Diseases Cancer, ...
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
2016-06-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; van Oven, Mannis; Wallace, Douglas C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F.; Attimonelli, Marcella; Zuchner, Stephan
2016-01-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060
Thiele, Herbert; Glandorf, Jörg; Hufnagel, Peter
2010-05-27
With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis software available, researchers today face major challenges validating and comparing their Proteomics data. Here we present a new generation of the ProteinScape bioinformatics platform, now enabling researchers to manage Proteomics data from the generation and data warehousing to a central data repository with a strong focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It addresses scientists; current needs in proteomics identification, quantification and validation. But producing large protein lists is not the end point in Proteomics, where one ultimately aims to answer specific questions about the biological condition or disease model of the analyzed sample. In this context, a new tool has been developed at the Spanish Centro Nacional de Biotecnologia Proteomics Facility termed PIKE (Protein information and Knowledge Extractor) that allows researchers to control, filter and access specific information from genomics and proteomic databases, to understand the role and relationships of the proteins identified in the experiments. Additionally, an EU funded project, ProDac, has coordinated systematic data collection in public standards-compliant repositories like PRIDE. This will cover all aspects from generating MS data in the laboratory, assembling the whole annotation information and storing it together with identifications in a standardised format.
2012-01-01
Background MicroRNAs (miRNAs) are noncoding RNAs that direct post-transcriptional regulation of protein coding genes. Recent studies have shown miRNAs are important for controlling many biological processes, including nervous system development, and are highly conserved across species. Given their importance, computational tools are necessary for analysis, interpretation and integration of high-throughput (HTP) miRNA data in an increasing number of model species. The Bioinformatics Resource Manager (BRM) v2.3 is a software environment for data management, mining, integration and functional annotation of HTP biological data. In this study, we report recent updates to BRM for miRNA data analysis and cross-species comparisons across datasets. Results BRM v2.3 has the capability to query predicted miRNA targets from multiple databases, retrieve potential regulatory miRNAs for known genes, integrate experimentally derived miRNA and mRNA datasets, perform ortholog mapping across species, and retrieve annotation and cross-reference identifiers for an expanded number of species. Here we use BRM to show that developmental exposure of zebrafish to 30 uM nicotine from 6–48 hours post fertilization (hpf) results in behavioral hyperactivity in larval zebrafish and alteration of putative miRNA gene targets in whole embryos at developmental stages that encompass early neurogenesis. We show typical workflows for using BRM to integrate experimental zebrafish miRNA and mRNA microarray datasets with example retrievals for zebrafish, including pathway annotation and mapping to human ortholog. Functional analysis of differentially regulated (p<0.05) gene targets in BRM indicates that nicotine exposure disrupts genes involved in neurogenesis, possibly through misregulation of nicotine-sensitive miRNAs. Conclusions BRM provides the ability to mine complex data for identification of candidate miRNAs or pathways that drive phenotypic outcome and, therefore, is a useful hypothesis generation tool for systems biology. The miRNA workflow in BRM allows for efficient processing of multiple miRNA and mRNA datasets in a single software environment with the added capability to interact with public data sources and visual analytic tools for HTP data analysis at a systems level. BRM is developed using Java™ and other open-source technologies for free distribution (http://www.sysbio.org/dataresources/brm.stm). PMID:23174015
USDA-ARS?s Scientific Manuscript database
Throughout the history of American sugarbeet production, research has proceeded hand-in-hand with the emergence of new diseases, and sugarbeet scientists have used the technologies available to improve disease management and crop yield in the face of the emerging disease pressures. Many traditional...
BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatia, Karan; Wang, Zhong
Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, includingmore » screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.« less
Wren, Jonathan D
2016-09-01
To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PipelineDog: a simple and flexible graphic pipeline construction and maintenance tool.
Zhou, Anbo; Zhang, Yeting; Sun, Yazhou; Xing, Jinchuan
2018-05-01
Analysis pipelines are an essential part of bioinformatics research, and ad hoc pipelines are frequently created by researchers for prototyping and proof-of-concept purposes. However, most existing pipeline management system or workflow engines are too complex for rapid prototyping or learning the pipeline concept. A lightweight, user-friendly and flexible solution is thus desirable. In this study, we developed a new pipeline construction and maintenance tool, PipelineDog. This is a web-based integrated development environment with a modern web graphical user interface. It offers cross-platform compatibility, project management capabilities, code formatting and error checking functions and an online repository. It uses an easy-to-read/write script system that encourages code reuse. With the online repository, it also encourages sharing of pipelines, which enhances analysis reproducibility and accountability. For most users, PipelineDog requires no software installation. Overall, this web application provides a way to rapidly create and easily manage pipelines. PipelineDog web app is freely available at http://web.pipeline.dog. The command line version is available at http://www.npmjs.com/package/pipelinedog and online repository at http://repo.pipeline.dog. ysun@kean.edu or xing@biology.rutgers.edu or ysun@diagnoa.com. Supplementary data are available at Bioinformatics online.
Unipro UGENE: a unified bioinformatics toolkit.
Okonechnikov, Konstantin; Golosova, Olga; Fursov, Mikhail
2012-04-15
Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality. UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Orozco, Allan; Morera, Jessica; Jiménez, Sergio; Boza, Ricardo
2013-09-01
Today, Bioinformatics has become a scientific discipline with great relevance for the Molecular Biosciences and for the Omics sciences in general. Although developed countries have progressed with large strides in Bioinformatics education and research, in other regions, such as Central America, the advances have occurred in a gradual way and with little support from the Academia, either at the undergraduate or graduate level. To address this problem, the University of Costa Rica's Medical School, a regional leader in Bioinformatics in Central America, has been conducting a series of Bioinformatics workshops, seminars and courses, leading to the creation of the region's first Bioinformatics Master's Degree. The recent creation of the Central American Bioinformatics Network (BioCANET), associated to the deployment of a supporting computational infrastructure (HPC Cluster) devoted to provide computing support for Molecular Biology in the region, is providing a foundational stone for the development of Bioinformatics in the area. Central American bioinformaticians have participated in the creation of as well as co-founded the Iberoamerican Bioinformatics Society (SOIBIO). In this article, we review the most recent activities in education and research in Bioinformatics from several regional institutions. These activities have resulted in further advances for Molecular Medicine, Agriculture and Biodiversity research in Costa Rica and the rest of the Central American countries. Finally, we provide summary information on the first Central America Bioinformatics International Congress, as well as the creation of the first Bioinformatics company (Indromics Bioinformatics), spin-off the Academy in Central America and the Caribbean.
Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba
2016-12-22
The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.
Workflows for microarray data processing in the Kepler environment.
Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark
2012-05-17
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
Is there room for ethics within bioinformatics education?
Taneri, Bahar
2011-07-01
When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.
Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.
Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying
2017-01-01
Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Suh, K. Stephen; Sarojini, Sreeja; Youssif, Maher; Nalley, Kip; Milinovikj, Natasha; Elloumi, Fathi; Russell, Steven; Pecora, Andrew; Schecter, Elyssa; Goy, Andre
2013-01-01
Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and “-omics” data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine. PMID:23818899
Federation in genomics pipelines: techniques and challenges.
Chaterji, Somali; Koo, Jinkyu; Li, Ninghui; Meyer, Folker; Grama, Ananth; Bagchi, Saurabh
2017-08-29
Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
2014-01-01
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.
2008-05-04
This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroicmore » effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster.« less
Analyzing large scale genomic data on the cloud with Sparkhit
Huang, Liren; Krüger, Jan
2018-01-01
Abstract Motivation The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform. Results Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92–157 times faster than MetaSpark on metagenomic fragment recruitment and 18–32 times faster than Crossbow on data pre-processing. We analyzed 100 terabytes of data across four genomic projects in the cloud in 21 h, which includes the run times of cluster deployment and data downloading. Furthermore, our application on the entire Human Microbiome Project shotgun sequencing data was completed in 2 h, presenting an approach to easily associate large amounts of public datasets with reference data. Availability and implementation Sparkhit is freely available at: https://rhinempi.github.io/sparkhit/. Contact asczyrba@cebitec.uni-bielefeld.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253074
Rein, Diane C.
2006-01-01
Setting: Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. Program Components: After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. Evaluation Mechanisms: The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Next Steps/Future Directions: Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making. PMID:16888666
Rein, Diane C
2006-07-01
Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making.
Prospects and limitations of full-text index structures in genome analysis
Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter
2012-01-01
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621
Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun
2006-08-25
We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
Wightman, Bruce; Hark, Amy T
2012-01-01
The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
New Methodology for Measuring Semantic Functional Similarity Based on Bidirectional Integration
ERIC Educational Resources Information Center
Jeong, Jong Cheol
2013-01-01
1.2 billion users in Facebook, 17 million articles in Wikipedia, and 190 million tweets per day have demanded significant increase of information processing through Internet in recent years. Similarly life sciences and bioinformatics also have faced issues of processing Big data due to the explosion of publicly available genomic information…
Detecting distant homologies on protozoans metabolic pathways using scientific workflows.
da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta
2010-01-01
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Creating databases for biological information: an introduction.
Stein, Lincoln
2002-08-01
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.
Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers
Brazas, Michelle D.; Ouellette, B. F. Francis
2016-01-01
Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression. PMID:27281025
Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.
Brazas, Michelle D; Ouellette, B F Francis
2016-06-01
Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.
Bioinformatics research in the Asia Pacific: a 2007 update.
Ranganathan, Shoba; Gribskov, Michael; Tan, Tin Wee
2008-01-01
We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27-30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.
Big data for big questions: it is time for data analysts to act
Moscato, Pablo
2015-01-01
Pablo Moscato speaks to Francesca Lake, Managing Editor Australian Research Council Future Fellow Prof. Pablo Moscato was born in 1964 in La Plata, Argentina. Obtaining his B.Sc. in Physics at University of La Plata, his PhD was defended at UNICAMP, Brazil. While at the California Institute of Technology Concurrent Computation Program he developed, in collaboration with Michael Norman, the first application of a methodology later called ‘memetic algorithms’, which is now widely used internationally. He is the founding co-director of the Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine (CIBM) (2006–present) and the funding director of the Newcastle Bioinformatics Initiative (2002–2006) of The University of Newcastle (Australia). He is also Chief Investigator of the Australian Research Council Centre in Bioinformatics. He is one of Australia's most cited computer scientists. Over the past 7 years, he has introduced a unifying hallmark of cancer progression based on the changes of information theory quantifiers, and developed a novel mathematical model and an associated solution procedure based on combinatorial optimization techniques to identify drug combinations for cancer therapeutics. In addition, he has identified proteomic signatures to predict the clinical symptoms of Alzheimer's disease, among other ‘firsts’. He is a member of the Editorial Board of Future Science OA. PMID:28031895
Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.
Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee
2018-01-01
Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools.
Lemieux Perreault, Louis-Philippe; Legault, Marc-André; Asselin, Géraldine; Dubé, Marie-Pierre
2016-12-01
Genotype imputation is now commonly performed following genome-wide genotyping experiments. Imputation increases the density of analyzed genotypes in the dataset, enabling fine-mapping across the genome. However, the process of imputation using the most recent publicly available reference datasets can require considerable computation power and the management of hundreds of large intermediate files. We have developed genipe, a complete genome-wide imputation pipeline which includes automatic reporting, imputed data indexing and management, and a suite of statistical tests for imputed data commonly used in genetic epidemiology (Sequence Kernel Association Test, Cox proportional hazards for survival analysis, and linear mixed models for repeated measurements in longitudinal studies). The genipe package is an open source Python software and is freely available for non-commercial use (CC BY-NC 4.0) at https://github.com/pgxcentre/genipe Documentation and tutorials are available at http://pgxcentre.github.io/genipe CONTACT: louis-philippe.lemieux.perreault@statgen.org or marie-pierre.dube@statgen.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Microarray R-based analysis of complex lysate experiments with MIRACLE
List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan
2014-01-01
Motivation: Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. Results: This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Availability: Project URL: http://www.nanocan.org/miracle/ Contact: mlist@health.sdu.dk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161257
Microarray R-based analysis of complex lysate experiments with MIRACLE.
List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan
2014-09-01
Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Project URL: http://www.nanocan.org/miracle/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Emerging strengths in Asia Pacific bioinformatics.
Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee
2008-12-12
The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20-23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.
Emerging strengths in Asia Pacific bioinformatics
Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee
2008-01-01
The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008
Extending Asia Pacific bioinformatics into new realms in the "-omics" era.
Ranganathan, Shoba; Eisenhaber, Frank; Tong, Joo Chuan; Tan, Tin Wee
2009-12-03
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.
Comparative BioInformatics and Computational Toxicology
Reflecting the numerous changes in the field since the publication of the previous edition, this third edition of Developmental Toxicology focuses on the mechanisms of developmental toxicity and incorporates current technologies for testing in the risk assessment process.
Food Safety in the Age of Next Generation Sequencing, Bioinformatics, and Open Data Access.
Taboada, Eduardo N; Graham, Morag R; Carriço, João A; Van Domselaar, Gary
2017-01-01
Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS) as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.
Zoukhri, Driss; Rawe, Ian; Singh, Mabi; Brown, Ashley; Kublin, Claire L; Dawson, Kevin; Haddon, William F; White, Earl L; Hanley, Kathleen M; Tusé, Daniel; Malyj, Wasyl; Papas, Athena
2012-03-01
The purpose of the current study was to determine if saliva contains biomarkers that can be used as diagnostic tools for Sjögren's syndrome (SjS). Twenty seven SjS patients and 27 age-matched healthy controls were recruited for these studies. Unstimulated glandular saliva was collected from the Wharton's duct using a suction device. Two µl of salvia were processed for mass spectrometry analyses on a prOTOF 2000 matrix-assisted laser desorption/ionization orthogonal time of flight (MALDI O-TOF) mass spectrometer. Raw data were analyzed using bioinformatic tools to identify biomarkers. MALDI O-TOF MS analyses of saliva samples were highly reproducible and the mass spectra generated were very rich in peptides and peptide fragments in the 750-7,500 Da range. Data analysis using bioinformatic tools resulted in several classification models being built and several biomarkers identified. One model based on 7 putative biomarkers yielded a sensitivity of 97.5%, specificity of 97.8% and an accuracy of 97.6%. One biomarker was present only in SjS samples and was identified as a proteolytic peptide originating from human basic salivary proline-rich protein 3 precursor. We conclude that salivary biomarkers detected by high-resolution mass spectrometry coupled with powerful bioinformatic tools offer the potential to serve as diagnostic/prognostic tools for SjS.
Field of genes: using Apache Kafka as a bioinformatic data repository.
Lawlor, Brendan; Lynch, Richard; Mac Aogáin, Micheál; Walsh, Paul
2018-04-01
Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI's RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-01-01
Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
NASA Astrophysics Data System (ADS)
Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.
2017-11-01
A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-06-01
Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Bioinformatics goes back to the future.
Miller, Crispin J; Attwood, Teresa K
2003-02-01
The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.
ERIC Educational Resources Information Center
Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany
2007-01-01
We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…
ERIC Educational Resources Information Center
Miskowski, Jennifer A.; Howard, David R.; Abler, Michael L.; Grunwald, Sandra K.
2007-01-01
Over the past 10 years, there has been a technical revolution in the life sciences leading to the emergence of a new discipline called bioinformatics. In response, bioinformatics-related topics have been incorporated into various undergraduate courses along with the development of new courses solely focused on bioinformatics. This report describes…
Component-Based Approach for Educating Students in Bioinformatics
ERIC Educational Resources Information Center
Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.
2009-01-01
There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…
ERIC Educational Resources Information Center
Shachak, Aviv; Ophir, Ron; Rubin, Eitan
2005-01-01
The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…
ERIC Educational Resources Information Center
Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.
2009-01-01
Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…
Lawlor, Brendan; Walsh, Paul
2015-01-01
There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.
Lawlor, Brendan; Walsh, Paul
2015-01-01
There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054
Integrating grant-funded research into the undergraduate biology curriculum using IMG-ACT.
Ditty, Jayna L; Williams, Kayla M; Keller, Megan M; Chen, Grischa Y; Liu, Xianxian; Parales, Rebecca E
2013-01-01
It has become clear in current scientific pedagogy that the emersion of students in the scientific process in terms of designing, implementing, and analyzing experiments is imperative for their education; as such, it has been our goal to model this active learning process in the classroom and laboratory in the context of a genuine scientific question. Toward this objective, the National Science Foundation funded a collaborative research grant between a primarily undergraduate institution and a research-intensive institution to study the chemotactic responses of the bacterium Pseudomonas putida F1. As part of the project, a new Bioinformatics course was developed in which undergraduates annotate relevant regions of the P. putida F1 genome using Integrated Microbial Genomes Annotation Collaboration Toolkit, a bioinformatics interface specifically developed for undergraduate programs by the Department of Energy Joint Genome Institute. Based on annotations of putative chemotaxis genes in P. putida F1 and comparative genomics studies, undergraduate students from both institutions developed functional genomics research projects that evolved from the annotations. The purpose of this study is to describe the nature of the NSF grant, the development of the Bioinformatics lecture and wet laboratory course, and how undergraduate student involvement in the project that was initiated in the classroom has served as a springboard for independent undergraduate research projects. Copyright © 2012 International Union of Biochemistry and Molecular Biology, Inc.
Lourenço, Anália; Ferreira, Andreia; Veiga, Nuno; Machado, Idalina; Pereira, Maria Olivia; Azevedo, Nuno F
2012-01-01
Consortia of microorganisms, commonly known as biofilms, are attracting much attention from the scientific community due to their impact in human activity. As biofilm research grows to be a data-intensive discipline, the need for suitable bioinformatics approaches becomes compelling to manage and validate individual experiments, and also execute inter-laboratory large-scale comparisons. However, biofilm data is widespread across ad hoc, non-standardized individual files and, thus, data interchange among researchers, or any attempt of cross-laboratory experimentation or analysis, is hardly possible or even attempted. This paper presents BiofOmics, the first publicly accessible Web platform specialized in the management and analysis of data derived from biofilm high-throughput studies. The aim is to promote data interchange across laboratories, implementing collaborative experiments, and enable the development of bioinformatics tools in support of the processing and analysis of the increasing volumes of experimental biofilm data that are being generated. BiofOmics' data deposition facility enforces data structuring and standardization, supported by controlled vocabulary. Researchers are responsible for the description of the experiments, their results and conclusions. BiofOmics' curators interact with submitters only to enforce data structuring and the use of controlled vocabulary. Then, BiofOmics' search facility makes publicly available the profile and data associated with a submitted study so that any researcher can profit from these standardization efforts to compare similar studies, generate new hypotheses to be tested or even extend the conditions experimented in the study. BiofOmics' novelty lies in its support to standardized data deposition, the availability of computerizable data files and the free-of-charge dissemination of biofilm studies across the community. Hopefully, this will open promising research possibilities, namely the comparison of results between different laboratories, the reproducibility of methods within and between laboratories, and the development of guidelines and standardized protocols for biofilm formation operating procedures and analytical methods.
Bioinformatics core competencies for undergraduate life sciences education.
Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A
2018-01-01
Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.
Bioinformatics core competencies for undergraduate life sciences education
Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.
2018-01-01
Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542
The ELIXIR channel in F1000Research.
Blomberg, Niklas; Oliveira, Arlindo; Mons, Barend; Persson, Bengt; Jonassen, Inge
2015-01-01
ELIXIR, the European life science infrastructure for biological information, is a unique initiative to consolidate Europe's national centres, services, and core bioinformatics resources into a single, coordinated infrastructure. ELIXIR brings together Europe's major life-science data archives and connects these with national bioinformatics infrastructures - the ELIXIR Nodes. This editorial introduces the ELIXIR channel in F1000Research; the aim of the channel is to collect and present ELIXIR's scientific and operational output, engage with the broad life science community and encourage discussion on proposed infrastructure solutions. Submissions will be assessed by the ELIXIR channel Advisory Board to ensure they are relevant to ELIXIR community, and subjected to F1000Research open peer review process.
The ELIXIR channel in F1000Research
Blomberg, Niklas; Oliveira, Arlindo; Mons, Barend; Persson, Bengt; Jonassen, Inge
2016-01-01
ELIXIR, the European life science infrastructure for biological information, is a unique initiative to consolidate Europe’s national centres, services, and core bioinformatics resources into a single, coordinated infrastructure. ELIXIR brings together Europe’s major life-science data archives and connects these with national bioinformatics infrastructures - the ELIXIR Nodes. This editorial introduces the ELIXIR channel in F1000Research; the aim of the channel is to collect and present ELIXIR’s scientific and operational output, engage with the broad life science community and encourage discussion on proposed infrastructure solutions. Submissions will be assessed by the ELIXIR channel Advisory Board to ensure they are relevant to ELIXIR community, and subjected to F1000Research open peer review process. PMID:26913192
Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource
ERIC Educational Resources Information Center
Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc
2005-01-01
EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…
National Plant Genome Initiative
2005-01-01
lines that do not require vernalization to fl ower. The capacity of temperate cereals like wheat and barley to generate spring forms through...the potential to modify fl owering time of different cereals for specifi c climates. 10 Progress Reported in 2004 • Bioinformatics The NPGI...developing an open source genome annotation pipeline as well as tools to present and manage information about natural variation in cereal varieties
A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences
2009-02-23
AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings
The 2017 Bioinformatics Open Source Conference (BOSC)
Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather
2017-01-01
The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest. PMID:29118973
The 2017 Bioinformatics Open Source Conference (BOSC).
Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather
2017-01-01
The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year's theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.
Rising Strengths Hong Kong SAR in Bioinformatics.
Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy
2017-06-01
Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.
An overview of bioinformatics methods for modeling biological pathways in yeast
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao
2016-01-01
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein–protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae. In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways in S. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. PMID:26476430
Bioinformatic perspectives on NRPS/PKS megasynthases: advances and challenges.
Jenke-Kodama, Holger; Dittmann, Elke
2009-07-01
The increased understanding of both fundamental principles and mechanistic variations of NRPS/PKS megasynthases along with the unprecedented availability of microbial sequences has inspired a number of in silico studies of both enzyme families. The insights that can be extracted from these analyses go far beyond a rough classification of data and have turned bioinformatics into a frontier field of natural products research. As databases are flooded with NRPS/PKS gene sequence of microbial genomes and metagenomes, increasingly reliable structural prediction methods can help to uncover hidden treasures. Already, phylogenetic analyses have revealed that NRPS/PKS pathways should not simply be regarded as enzyme complexes, specifically evolved to product a selected natural product. Rather, they represent a collection of genetic opinions, allowing biosynthetic pathways to be shuffled in a process of perpetual chemical innovations and pathways diversification in nature can give impulses for specificities, protein interactions and genetic engineering of libraries of novel peptides and polyketides. The successful translation of the knowledge obtained from bioinformatic dissection of NRPS/PKS megasynthases into new techniques for drug discovery and design remain challenges for the future.
bioNerDS: exploring bioinformatics’ database and software use through literature mining
2013-01-01
Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...
2015-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha
2014-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
AutoAssemblyD: a graphical user interface system for several genome assemblers.
Veras, Adonney Allan de Oliveira; de Sá, Pablo Henrique Caracciolo Gomes; Azevedo, Vasco; Silva, Artur; Ramos, Rommel Thiago Jucá
2013-01-01
Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates. AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher.
Bioinformatics education dissemination with an evolutionary problem solving perspective.
Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J
2010-11-01
Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.
Creating databases for biological information: an introduction.
Stein, Lincoln
2013-06-01
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.
Bioinformatics education in India.
Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas
2010-11-01
An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.
Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh
2015-12-01
New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.
PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J
2017-09-20
There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.
A Web-based assessment of bioinformatics end-user support services at US universities.
Messersmith, Donna J; Benson, Dennis A; Geer, Renata C
2006-07-01
This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.
Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology
Wood, Louisa; Gebhardt, Philipp
2013-01-01
Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266
The 2016 Bioinformatics Open Source Conference (BOSC).
Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather
2016-01-01
Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.
Bioinformatics clouds for big data manipulation.
Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang
2012-11-28
As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
Graphics processing units in bioinformatics, computational biology and systems biology.
Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela
2017-09-01
Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos
2017-08-01
Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.
Field of genes: using Apache Kafka as a bioinformatic data repository
Lynch, Richard; Walsh, Paul
2018-01-01
Abstract Background Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI’s) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI’s RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. Results The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Conclusions Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data. PMID:29635394
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis
2017-01-01
Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616
Cloud-based adaptive exon prediction for DNA analysis.
Putluri, Srinivasareddy; Zia Ur Rahman, Md; Fathima, Shaik Yasmeen
2018-02-01
Cloud computing offers significant research and economic benefits to healthcare organisations. Cloud services provide a safe place for storing and managing large amounts of such sensitive data. Under conventional flow of gene information, gene sequence laboratories send out raw and inferred information via Internet to several sequence libraries. DNA sequencing storage costs will be minimised by use of cloud service. In this study, the authors put forward a novel genomic informatics system using Amazon Cloud Services, where genomic sequence information is stored and accessed for processing. True identification of exon regions in a DNA sequence is a key task in bioinformatics, which helps in disease identification and design drugs. Three base periodicity property of exons forms the basis of all exon identification techniques. Adaptive signal processing techniques found to be promising in comparison with several other methods. Several adaptive exon predictors (AEPs) are developed using variable normalised least mean square and its maximum normalised variants to reduce computational complexity. Finally, performance evaluation of various AEPs is done based on measures such as sensitivity, specificity and precision using various standard genomic datasets taken from National Center for Biotechnology Information genomic sequence database.
Bioinformatics clouds for big data manipulation
2012-01-01
Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
The 2016 Bioinformatics Open Source Conference (BOSC)
Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather
2016-01-01
Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083
Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba
2018-01-19
The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.
The 2015 Bioinformatics Open Source Conference (BOSC 2015).
Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica
2016-02-01
The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.
GBA manager: an online tool for querying low-complexity regions in proteins.
Bandyopadhyay, Nirmalya; Kahveci, Tamer
2010-01-01
Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ranjan, Priya; Yin, Tongming; Zhang, Xinye
2009-11-01
Quantitative trait locus (QTL) studies are an integral part of plant research and are used to characterize the genetic basis of phenotypic variation observed in structured populations and inform marker-assisted breeding efforts. These QTL intervals can span large physical regions on a chromosome comprising hundreds of genes, thereby hampering candidate gene identification. Genome history, evolution, and expression evidence can be used to narrow the genes in the interval to a smaller list that is manageable for detailed downstream functional genomics characterization. Our primary motivation for the present study was to address the need for a research methodology that identifies candidatemore » genes within a broad QTL interval. Here we present a bioinformatics-based approach for subdividing candidate genes within QTL intervals into alternate groups of high probability candidates. Application of this approach in the context of studying cell wall traits, specifically lignin content and S/G ratios of stem and root in Populus plants, resulted in manageable sets of genes of both known and putative cell wall biosynthetic function. These results provide a roadmap for future experimental work leading to identification of new genes controlling cell wall recalcitrance and, ultimately, in the utility of plant biomass as an energy feedstock.« less
Baldwin, Ransom L; Li, Robert W; Jia, Yankai; Li, Cong-Jun
2018-01-01
The purpose of this study was to evaluate the effects of butyrate infusion on rumen epithelial transcriptome. Next-generation sequencing (NGS) and bioinformatics are used to accelerate our understanding of regulation in rumen epithelial transcriptome of cattle in the dry period induced by butyrate infusion at the level of the whole transcriptome. Butyrate, as an essential element of nutrients, is a histone deacetylase (HDAC) inhibitor that can alter histone acetylation and methylation, and plays a prominent role in regulating genomic activities influencing rumen nutrition utilization and function. Ruminal infusion of butyrate was following 0-hour sampling (baseline controls) and continued for 168 hours at a rate of 5.0 L/day of a 2.5 M solution as a continuous infusion. Following the 168-hour infusion, the infusion was stopped, and cows were maintained on the basal lactation ration for an additional 168 hours for sampling. Rumen epithelial samples were serially collected via biopsy through rumen fistulae at 0-, 24-, 72-, and 168-hour (D1, D3, D7) and 168-hour post-infusion (D14). In comparison with pre-infusion at 0 hours, a total of 3513 genes were identified to be impacted in the rumen epithelium by butyrate infusion at least once at different sampling time points at a stringent cutoff of false discovery rate (FDR) < 0.01. The maximal effect of butyrate was observed at day 7. Among these impacted genes, 117 genes were responsive consistently from day 1 to day 14, and another 42 genes were lasting through day 7. Temporal effects induced by butyrate infusion indicate that the transcriptomic alterations are very dynamic. Gene ontology (GO) enrichment analysis revealed that in the early stage of rumen butyrate infusion (on day 1 and day 3 of butyrate infusion), the transcriptomic effects in the rumen epithelium were involved with mitotic cell cycle process, cell cycle process, and regulation of cell cycle. Bioinformatic analysis of cellular functions, canonical pathways, and upstream regulator of impacted genes underlie the potential mechanisms of butyrate-induced gene expression regulation in rumen epithelium. The introduction of transcriptomic and bioinformatic technologies to study nutrigenomics in the farm animal presented a new prospect to study multiple levels of biological information to better apprehend the whole animal response to nutrition, physiological state, and their interactions. The nutrigenomics approach may eventually lead to more precise management of utilization of feed resources in a more effective approach. PMID:29785087
Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin
2015-07-01
As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.
Buying in to bioinformatics: an introduction to commercial sequence analysis software
2015-01-01
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247
Buying in to bioinformatics: an introduction to commercial sequence analysis software.
Smith, David Roy
2015-07-01
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.
InCoB2012 Conference: from biological data to knowledge to technological breakthroughs
2012-01-01
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China. PMID:23281929
Iourov, Ivan Y; Vorsanova, Svetlana G; Voinova, Victoria Y; Yurov, Yuri B
2015-01-01
In contrast to other autism spectrum disorders, chromosome abnormalities are rare in Asperger syndrome (AS) or high-functioning autism. Consequently, AS was occasionally subjected to classical positional cloning. Here, we report on a case of AS associated with a deletion of the short arm of chromosome 3. Further in silico analysis has identified a candidate gene for AS and has suggested a therapeutic strategy for manifestations of the chromosome rearrangement. Using array comparative genomic hybridization, an interstitial deletion of 3p22.1p21.31 (~2.5 Mb in size) in a child with Asperger's syndrome, seborrheic dermatitis and chronic pancreatitis was detected. Original bioinformatic approach to the prioritization of candidate genes/processes identified CCK (cholecystokinin) as a candidate gene for AS. In addition to processes associated with deleted genes, bioinformatic analysis of CCK gene interactome indicated that zinc deficiency might be a pathogenic mechanism in this case. This suggestion was supported by plasma zinc concentration measurements. The increase of zinc intake produced a rise in zinc plasma concentration and the improvement in the patient's condition. Our study supported previous linkage findings and had suggested a new candidate gene in AS. Moreover, bioinformatic analysis identified the pathogenic mechanism, which was used to propose a therapeutic strategy for manifestations of the deletion. The relative success of this strategy allows speculating that therapeutic or dietary normalization of metabolic processes altered by a chromosome imbalance or genomic copy number variations may be a way for treating at least a small proportion of cases of these presumably incurable genetic conditions.
Translational bioinformatics: linking the molecular world to the clinical world.
Altman, R B
2012-06-01
Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.
Importance of databases of nucleic acids for bioinformatic analysis focused to genomics
NASA Astrophysics Data System (ADS)
Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.
2016-08-01
Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.
A bioinformatics roadmap for the human vaccines project.
Scheuermann, Richard H; Sinkovits, Robert S; Schenkelberg, Theodore; Koff, Wayne C
2017-06-01
Biomedical research has become a data intensive science in which high throughput experimentation is producing comprehensive data about biological systems at an ever-increasing pace. The Human Vaccines Project is a new public-private partnership, with the goal of accelerating development of improved vaccines and immunotherapies for global infectious diseases and cancers by decoding the human immune system. To achieve its mission, the Project is developing a Bioinformatics Hub as an open-source, multidisciplinary effort with the overarching goal of providing an enabling infrastructure to support the data processing, analysis and knowledge extraction procedures required to translate high throughput, high complexity human immunology research data into biomedical knowledge, to determine the core principles driving specific and durable protective immune responses.
Ontology-based, Tissue MicroArray oriented, image centered tissue bank
Viti, Federica; Merelli, Ivan; Caprera, Andrea; Lazzari, Barbara; Stella, Alessandra; Milanesi, Luciano
2008-01-01
Background Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. Results In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. Conclusions Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes. PMID:18460177
Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén
2016-08-11
Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.
Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén
2016-01-01
Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods. PMID:27529225
A Web-based assessment of bioinformatics end-user support services at US universities
Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.
2006-01-01
Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663
LXtoo: an integrated live Linux distribution for the bioinformatics community
2012-01-01
Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356
LXtoo: an integrated live Linux distribution for the bioinformatics community.
Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu
2012-07-19
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
Expanding roles in a library-based bioinformatics service program: a case study
Li, Meng; Chen, Yi-Bu; Clintworth, William A
2013-01-01
Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting: A program at a health sciences library serving a large academic medical center with a strong research focus is described. Methods: The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. Results: Personnel and software have increased along with the number of registered software users and use of the provided services. Conclusion: With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs. PMID:24163602
4273π: Bioinformatics education on low cost ARM hardware
2013-01-01
Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194
4273π: bioinformatics education on low cost ARM hardware.
Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D
2013-08-12
Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.
A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.
Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis
2012-07-01
The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.
How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?
The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are ...
Systems analysis of arrestin pathway functions.
Maudsley, Stuart; Siddiqui, Sana; Martin, Bronwen
2013-01-01
To fully appreciate the diversity and specificity of complex cellular signaling events, such as arrestin-mediated signaling from G protein-coupled receptor activation, a complex systems-level investigation currently appears to be the best option. A rational combination of transcriptomics, proteomics, and interactomics, all coherently integrated with applied next-generation bioinformatics, is vital for the future understanding of the development, translation, and expression of GPCR-mediated arrestin signaling events in physiological contexts. Through a more nuanced, systems-level appreciation of arrestin-mediated signaling, the creation of arrestin-specific molecular response "signatures" should be made simple and ultimately amenable to drug discovery processes. Arrestin-based signaling paradigms possess important aspects, such as its specific temporal kinetics and ability to strongly affect transcriptional activity, that make it an ideal test bed for next-generation of drug discovery bioinformatic approaches such as multi-parallel dose-response analysis, data texturization, and latent semantic indexing-based natural language data processing and feature extraction. Copyright © 2013 Elsevier Inc. All rights reserved.
Molgenis-impute: imputation pipeline in a box.
Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk; Byelas, Heorhiy; Dijkstra, Martijn; Swertz, Morris A
2015-08-19
Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from: https://github.com/molgenis/molgenis-imputation.
The 2015 Bioinformatics Open Source Conference (BOSC 2015)
Harris, Nomi L.; Cock, Peter J. A.; Lapp, Hilmar
2016-01-01
The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included “Data Science;” “Standards and Interoperability;” “Open Science and Reproducibility;” “Translational Bioinformatics;” “Visualization;” and “Bioinformatics Open Source Project Updates”. In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled “Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community,” that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653
Oulas, Anastasis; Minadakis, George; Zachariou, Margarita; Sokratous, Kleitos; Bourdakou, Marilena M; Spyrou, George M
2017-11-27
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine. © The Author 2017. Published by Oxford University Press.
Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers
Chen, Hao; Zhu, Zhitu; Zhu, Yichun; Wang, Jian; Mei, Yunqing; Cheng, Yunfeng
2015-01-01
It is known that a disease is rarely a consequence of an abnormality of a single gene, but reflects the interactions of various processes in a complex network. Annotated molecular networks offer new opportunities to understand diseases within a systems biology framework and provide an excellent substrate for network-based identification of biomarkers. The network biomarkers and dynamic network biomarkers (DNBs) represent new types of biomarkers with protein–protein or gene–gene interactions that can be monitored and evaluated at different stages and time-points during development of disease. Clinical bioinformatics as a new way to combine clinical measurements and signs with human tissue-generated bioinformatics is crucial to translate biomarkers into clinical application, validate the disease specificity, and understand the role of biomarkers in clinical settings. In this article, the recent advances and developments on network biomarkers and DNBs are comprehensively reviewed. How network biomarkers help a better understanding of molecular mechanism of diseases, the advantages and constraints of network biomarkers for clinical application, clinical bioinformatics as a bridge to the development of diseases-specific, stage-specific, severity-specific and therapy predictive biomarkers, and the potentials of network biomarkers are also discussed. PMID:25560835
An overview of bioinformatics methods for modeling biological pathways in yeast.
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao; Cheng, Jianlin
2016-03-01
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Genomics pipelines and data integration: challenges and opportunities in the research setting
Davis-Turak, Jeremy; Courtney, Sean M.; Hazard, E. Starr; Glen, W. Bailey; da Silveira, Willian; Wesselman, Timothy; Harbin, Larry P.; Wolf, Bethany J.; Chung, Dongjun; Hardiman, Gary
2017-01-01
Introduction The emergence and mass utilization of high-throughput (HT) technologies, including sequencing technologies (genomics) and mass spectrometry (proteomics, metabolomics, lipids), has allowed geneticists, biologists, and biostatisticians to bridge the gap between genotype and phenotype on a massive scale. These new technologies have brought rapid advances in our understanding of cell biology, evolutionary history, microbial environments, and are increasingly providing new insights and applications towards clinical care and personalized medicine. Areas covered The very success of this industry also translates into daunting big data challenges for researchers and institutions that extend beyond the traditional academic focus of algorithms and tools. The main obstacles revolve around analysis provenance, data management of massive datasets, ease of use of software, interpretability and reproducibility of results. Expert Commentary The authors review the challenges associated with implementing bioinformatics best practices in a large-scale setting, and highlight the opportunity for establishing bioinformatics pipelines that incorporate data tracking and auditing, enabling greater consistency and reproducibility for basic research, translational or clinical settings. PMID:28092471
Community-driven computational biology with Debian Linux.
Möller, Steffen; Krabbenhöft, Hajo Nils; Tille, Andreas; Paleino, David; Williams, Alan; Wolstencroft, Katy; Goble, Carole; Holland, Richard; Belhachemi, Dominique; Plessy, Charles
2010-12-21
The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers.
Genomics pipelines and data integration: challenges and opportunities in the research setting.
Davis-Turak, Jeremy; Courtney, Sean M; Hazard, E Starr; Glen, W Bailey; da Silveira, Willian A; Wesselman, Timothy; Harbin, Larry P; Wolf, Bethany J; Chung, Dongjun; Hardiman, Gary
2017-03-01
The emergence and mass utilization of high-throughput (HT) technologies, including sequencing technologies (genomics) and mass spectrometry (proteomics, metabolomics, lipids), has allowed geneticists, biologists, and biostatisticians to bridge the gap between genotype and phenotype on a massive scale. These new technologies have brought rapid advances in our understanding of cell biology, evolutionary history, microbial environments, and are increasingly providing new insights and applications towards clinical care and personalized medicine. Areas covered: The very success of this industry also translates into daunting big data challenges for researchers and institutions that extend beyond the traditional academic focus of algorithms and tools. The main obstacles revolve around analysis provenance, data management of massive datasets, ease of use of software, interpretability and reproducibility of results. Expert commentary: The authors review the challenges associated with implementing bioinformatics best practices in a large-scale setting, and highlight the opportunity for establishing bioinformatics pipelines that incorporate data tracking and auditing, enabling greater consistency and reproducibility for basic research, translational or clinical settings.
The impact of next-generation sequencing on genomics
Zhang, Jun; Chiodini, Rod; Badr, Ahmed; Zhang, Genfa
2011-01-01
This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also presents a significant challenge for data storage, analyses, and management solutions. Advanced bioinformatic tools are essential for the successful application of NGS technology. As evidenced throughout this review, NGS technologies will have a striking impact on genomic research and the entire biological field. With its ability to tackle the unsolved challenges unconquered by previous genomic technologies, NGS is likely to unravel the complexity of the human genome in terms of genetic variations, some of which may be confined to susceptible loci for some common human conditions. The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come. PMID:21477781
A vision for collaborative training infrastructure for bioinformatics.
Williams, Jason J; Teal, Tracy K
2017-01-01
In biology, a missing link connecting data generation and data-driven discovery is the training that prepares researchers to effectively manage and analyze data. National and international cyberinfrastructure along with evolving private sector resources place biologists and students within reach of the tools needed for data-intensive biology, but training is still required to make effective use of them. In this concept paper, we review a number of opportunities and challenges that can inform the creation of a national bioinformatics training infrastructure capable of servicing the large number of emerging and existing life scientists. While college curricula are slower to adapt, grassroots startup-spirited organizations, such as Software and Data Carpentry, have made impressive inroads in training on the best practices of software use, development, and data analysis. Given the transformative potential of biology and medicine as full-fledged data sciences, more support is needed to organize, amplify, and assess these efforts and their impacts. © 2016 New York Academy of Sciences.
Influenza research database: an integrated bioinformatics resource for influenza virus research
USDA-ARS?s Scientific Manuscript database
The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...
Rapid Development of Bioinformatics Education in China
ERIC Educational Resources Information Center
Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang
2003-01-01
As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…
Santos, Eliane Macedo Sobrinho; Santos, Hércules Otacílio; Dos Santos Dias, Ivoneth; Santos, Sérgio Henrique; Batista de Paula, Alfredo Maurício; Feltenberger, John David; Sena Guimarães, André Luiz; Farias, Lucyana Conceição
2016-01-01
Pathogenesis of odontogenic tumors is not well known. It is important to identify genetic deregulations and molecular alterations. This study aimed to investigate, through bioinformatic analysis, the possible genes involved in the pathogenesis of ameloblastoma (AM) and keratocystic odontogenic tumor (KCOT). Genes involved in the pathogenesis of AM and KCOT were identified in GeneCards. Gene list was expanded, and the gene interactions network was mapped using the STRING software. "Weighted number of links" (WNL) was calculated to identify "leader genes" (highest WNL). Genes were ranked by K-means method and Kruskal-Wallis test was used (P<0.001). Total interactions score (TIS) was also calculated using all interaction data generated by the STRING database, in order to achieve global connectivity for each gene. The topological and ontological analyses were performed using Cytoscape software and BinGO plugin. Literature review data was used to corroborate the bioinformatics data. CDK1 was identified as leader gene for AM. In KCOT group, results show PCNA and TP53 . Both tumors exhibit a power law behavior. Our topological analysis suggested leader genes possibly important in the pathogenesis of AM and KCOT, by clustering coefficient calculated for both odontogenic tumors (0.028 for AM, zero for KCOT). The results obtained in the scatter diagram suggest an important relationship of these genes with the molecular processes involved in AM and KCOT. Ontological analysis for both AM and KCOT demonstrated different mechanisms. Bioinformatics analyzes were confirmed through literature review. These results may suggest the involvement of promising genes for a better understanding of the pathogenesis of AM and KCOT.
ReGaTE: Registration of Galaxy Tools in Elixir
Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé
2017-01-01
Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE. PMID:28402416
2012-01-01
Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941
El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter
2012-01-01
Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
Atlas - a data warehouse for integrative bioinformatics.
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis
2005-02-21
We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics
ERIC Educational Resources Information Center
Zhang, Xiaorong
2009-01-01
This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…
BioStar: an online question & answer resource for the bioinformatics community
USDA-ARS?s Scientific Manuscript database
Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...
Honts, Jerry E.
2003-01-01
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489
Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE.
Huang, Xin; Li, Hao-ming
2009-08-05
Lovastatin is an effective drug for treatment of hyperlipidemia. This study aimed to clone lovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein. According to the lovastatin synthase gene sequence from genebank, primers were designed to amplify and clone the lovastatin biosynthesis regulatory gene lovE from Aspergillus terrus genomic DNA. Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through internet resources and software like DNAMAN. Target fragment lovE, almost 1500 bp in length, was amplified from Aspergillus terrus genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted. In the lovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-like transcriptional factor.
Data mining in newt-omics, the repository for omics data from the newt.
Looso, Mario; Braun, Thomas
2015-01-01
Salamanders are an excellent model organism to study regenerative processes due to their unique ability to regenerate lost appendages or organs. Straightforward bioinformatics tools to analyze and take advantage of the growing number of "omics" studies performed in salamanders were lacking so far. To overcome this limitation, we have generated a comprehensive data repository for the red-spotted newt Notophthalmus viridescens, named newt-omics, merging omics style datasets on the transcriptome and proteome level including expression values and annotations. The resource is freely available via a user-friendly Web-based graphical user interface ( http://newt-omics.mpi-bn.mpg.de) that allows access and queries to the database without prior bioinformatical expertise. The repository is updated regularly, incorporating new published datasets from omics technologies.
SoS Notebook: An Interactive Multi-Language Data Analysis Environment.
Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N
2018-05-22
Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.
Establishing a master's degree programme in bioinformatics: challenges and opportunities.
Sahinidis, N V; Harandi, M T; Heath, M T; Murphy, L; Snir, M; Wheeler, R P; Zukoski, C F
2005-12-01
The development of the Bioinformatics MS degree program at the University of Illinois, the challenges and opportunities associated with such a process, and the current structure of the program is described. This program has departed from earlier University practice in significant ways. Despite the existence of several interdisciplinary programs at the University, a few of which grant degrees, this is the first interdisciplinary program that grants degrees and formally recognises departmental specialisation areas. The program, which is not owned by any particular department but by the Graduate College itself, is operated in a franchise-like fashion via several departmental concentrations. With four different colleges and many more departments involved in establishing and operating the program, the logistics of the operation are of considerable complexity but result in significant interactions across the entire campus.
Achieving High Performance with FPGA-Based Computing
Herbordt, Martin C.; VanCourt, Tom; Gu, Yongfeng; Sukhwani, Bharat; Conti, Al; Model, Josh; DiSabello, Doug
2011-01-01
Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric. PMID:21603088
The next generation of training for Arabidopsis researchers: bioinformatics and quantitative biology
USDA-ARS?s Scientific Manuscript database
It has been more than 50 years since Arabidopsis (Arabidopsis thaliana) was first introduced as a model organism to understand basic processes in plant biology. A well-organized scientific community has used this small reference plant species to make numerous fundamental plant biology discoveries (P...
The Jukes-Cantor Model of Molecular Evolution
ERIC Educational Resources Information Center
Erickson, Keith
2010-01-01
The material in this module introduces students to some of the mathematical tools used to examine molecular evolution. This topic is standard fare in many mathematical biology or bioinformatics classes, but could also be suitable for classes in linear algebra or probability. While coursework in matrix algebra, Markov processes, Monte Carlo…
Integrating PCR Theory and Bioinformatics into a Research-oriented Primer Design Exercise
ERIC Educational Resources Information Center
Robertson, Amber L.; Phillips, Allison R.
2008-01-01
Polymerase chain reaction (PCR) is a conceptually difficult technique that embodies many fundamental biological processes. Traditionally, students have struggled to analyze PCR results due to an incomplete understanding of the biological concepts (theory) of DNA replication and strand complementarity. Here we describe the design of a novel…
Secretome profiles of immortalized dental follicle cells using iTRAQ-based proteomic analysis.
Dou, Lei; Wu, Yan; Yan, Qifang; Wang, Jinhua; Zhang, Yan; Ji, Ping
2017-08-04
Secretomes produced by mesenchymal stromal cells (MSCs) were considered to be therapeutic potential. However, harvesting enough primary MSCs from tissue was time-consuming and costly, which impeded the application of MSCs secretomes. This study was to immortalize MSCs and compare the secretomes profile of immortalized and original MSCs. Human dental follicle cells (DFCs) were isolated and immortalized using pMPH86. The secretome profile of immortalized DFCs (iDFCs) was investigated and compared using iTRAQ labeling combined with mass spectrometry (MS) quantitative proteomics. The MS data was analyzed using ProteinPilotTM software, and then bioinformatic analysis of identified proteins was done. A total of 2092 secreted proteins were detected in conditioned media of iDFCs. Compared with primary DFCs, 253 differently expressed proteins were found in iDFCs secretome (142 up-regulated and 111 down-regulated). Intensive bioinformatic analysis revealed that the majority of secreted proteins were involved in cellular process, metabolic process, biological regulation, cellular component organization or biogenesis, immune system process, developmental process, response to stimulus and signaling. Proteomic profile of cell secretome wasn't largely affected after immortalization converted by this piggyBac immortalization system. The secretome of iDFCs may be a good candidate of primary DFCs for regenerative medicine.
The StratusLab cloud distribution: Use-cases and support for scientific applications
NASA Astrophysics Data System (ADS)
Floros, E.
2012-04-01
The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take advantage of StratusLab cloud solutions. Interested users are welcomed to join StratusLab's user community by getting access to the reference cloud services deployed by the project and offered to the public.
Exploiting graphics processing units for computational biology and bioinformatics.
Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H
2010-09-01
Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology
ERIC Educational Resources Information Center
Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.
2008-01-01
Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…
Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru; Ono, Keiichiro; Aoki-Kinoshita, Kiyoko F; Yamamoto, Yasunori; Yamaguchi, Atsuko; Kawashima, Shuichi; Chun, Hong-Woo; Aerts, Jan; Aranda, Bruno; Barboza, Lord Hendrix; Bonnal, Raoul Jp; Bruskiewich, Richard; Bryne, Jan C; Fernández, José M; Funahashi, Akira; Gordon, Paul Mk; Goto, Naohisa; Groscurth, Andreas; Gutteridge, Alex; Holland, Richard; Kano, Yoshinobu; Kawas, Edward A; Kerhornou, Arnaud; Kibukawa, Eri; Kinjo, Akira R; Kuhn, Michael; Lapp, Hilmar; Lehvaslaiho, Heikki; Nakamura, Hiroyuki; Nakamura, Yasukazu; Nishizawa, Tatsuya; Nobata, Chikashi; Noguchi, Tamotsu; Oinn, Thomas M; Okamoto, Shinobu; Owen, Stuart; Pafilis, Evangelos; Pocock, Matthew; Prins, Pjotr; Ranzinger, René; Reisinger, Florian; Salwinski, Lukasz; Schreiber, Mark; Senger, Martin; Shigemoto, Yasumasa; Standley, Daron M; Sugawara, Hideaki; Tashiro, Toshiyuki; Trelles, Oswaldo; Vos, Rutger A; Wilkinson, Mark D; York, William; Zmasek, Christian M; Asai, Kiyoshi; Takagi, Toshihisa
2010-08-21
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
2010-01-01
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies. PMID:20727200
Generative Topic Modeling in Image Data Mining and Bioinformatics Studies
ERIC Educational Resources Information Center
Chen, Xin
2012-01-01
Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…
A Portable Bioinformatics Course for Upper-Division Undergraduate Curriculum in Sciences
ERIC Educational Resources Information Center
Floraino, Wely B.
2008-01-01
This article discusses the challenges that bioinformatics education is facing and describes a bioinformatics course that is successfully taught at the California State Polytechnic University, Pomona, to the fourth year undergraduate students in biological sciences, chemistry, and computer science. Information on lecture and computer practice…
Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course
ERIC Educational Resources Information Center
Weisman, David
2010-01-01
Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…
A Mathematical Optimization Problem in Bioinformatics
ERIC Educational Resources Information Center
Heyer, Laurie J.
2008-01-01
This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…
Biology in 'silico': The Bioinformatics Revolution.
ERIC Educational Resources Information Center
Bloom, Mark
2001-01-01
Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…
ERIC Educational Resources Information Center
Rowe, Laura
2017-01-01
An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…
A Summer Program Designed to Educate College Students for Careers in Bioinformatics
ERIC Educational Resources Information Center
Krilowicz, Beverly; Johnston, Wendie; Sharp, Sandra B.; Warter-Perez, Nancy; Momand, Jamil
2007-01-01
A summer program was created for undergraduates and graduate students that teaches bioinformatics concepts, offers skills in professional development, and provides research opportunities in academic and industrial institutions. We estimate that 34 of 38 graduates (89%) are in a career trajectory that will use bioinformatics. Evidence from…
Assessment of a Bioinformatics across Life Science Curricula Initiative
ERIC Educational Resources Information Center
Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.
2007-01-01
At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…
Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics
ERIC Educational Resources Information Center
Likic, Vladimir A.
2006-01-01
This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…
Teaching Bioinformatics and Neuroinformatics by Using Free Web-Based Tools
ERIC Educational Resources Information Center
Grisham, William; Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson
2010-01-01
This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with…
Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf
2005-09-01
The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.
Park, Hyun-Seok
2012-12-01
Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.
GLAD: a system for developing and deploying large-scale bioinformatics grid.
Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong
2005-03-01
Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.
BIAS: Bioinformatics Integrated Application Software.
Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z
2005-04-15
We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.
Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D
2008-01-01
Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345
Whiley, Phillip J.; Parsons, Michael T.; Leary, Jennifer; Tucker, Kathy; Warwick, Linda; Dopita, Belinda; Thorne, Heather; Lakhani, Sunil R.; Goldgar, David E.; Brown, Melissa A.; Spurdle, Amanda B.
2014-01-01
Rare exonic, non-truncating variants in known cancer susceptibility genes such as BRCA1 and BRCA2 are problematic for genetic counseling and clinical management of relevant families. This study used multifactorial likelihood analysis and/or bioinformatically-directed mRNA assays to assess pathogenicity of 19 BRCA1 or BRCA2 variants identified following patient referral to clinical genetic services. Two variants were considered to be pathogenic (Class 5). BRCA1:c.4484G> C(p.Arg1495Thr) was shown to result in aberrant mRNA transcripts predicted to encode truncated proteins. The BRCA1:c.122A>G(p.His41Arg) RING-domain variant was found from multifactorial likelihood analysis to have a posterior probability of pathogenicity of 0.995, a result consistent with existing protein functional assay data indicating lost BARD1 binding and ubiquitin ligase activity. Of the remaining variants, seven were determined to be not clinically significant (Class 1), nine were likely not pathogenic (Class 2), and one was uncertain (Class 3).These results have implications for genetic counseling and medical management of families carrying these specific variants. They also provide additional multifactorial likelihood variant classifications as reference to evaluate the sensitivity and specificity of bioinformatic prediction tools and/or functional assay data in future studies. PMID:24489791
MEMOSys: Bioinformatics platform for genome-scale metabolic models
2011-01-01
Background Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models. Results MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models. Conclusions We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at http://www.icbi.at/MEMOSys. PMID:21276275
BamTools: a C++ API and toolkit for analyzing and managing BAM files.
Barnett, Derek W; Garrison, Erik K; Quinlan, Aaron R; Strömberg, Michael P; Marth, Gabor T
2011-06-15
Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.
Scaling bioinformatics applications on HPC.
Mikailov, Mike; Luo, Fu-Jyh; Barkley, Stuart; Valleru, Lohit; Whitney, Stephen; Liu, Zhichao; Thakkar, Shraddha; Tong, Weida; Petrick, Nicholas
2017-12-28
Recent breakthroughs in molecular biology and next generation sequencing technologies have led to the expenential growh of the sequence databases. Researchrs use BLAST for processing these sequences. However traditional software parallelization techniques (threads, message passing interface) applied in newer versios of BLAST are not adequate for processing these sequences in timely manner. A new method for array job parallelization has been developed which offers O(T) theoretical speed-up in comparison to multi-threading and MPI techniques. Here T is the number of array job tasks. (The number of CPUs that will be used to complete the job equals the product of T multiplied by the number of CPUs used by a single task.) The approach is based on segmentation of both input datasets to the BLAST process, combining partial solutions published earlier (Dhanker and Gupta, Int J Comput Sci Inf Technol_5:4818-4820, 2014), (Grant et al., Bioinformatics_18:765-766, 2002), (Mathog, Bioinformatics_19:1865-1866, 2003). It is accordingly referred to as a "dual segmentation" method. In order to implement the new method, the BLAST source code was modified to allow the researcher to pass to the program the number of records (effective number of sequences) in the original database. The team also developed methods to manage and consolidate the large number of partial results that get produced. Dual segmentation allows for massive parallelization, which lifts the scaling ceiling in exciting ways. BLAST jobs that hitherto failed or slogged inefficiently to completion now finish with speeds that characteristically reduce wallclock time from 27 days on 40 CPUs to a single day using 4104 tasks, each task utilizing eight CPUs and taking less than 7 minutes to complete. The massive increase in the number of tasks when running an analysis job with dual segmentation reduces the size, scope and execution time of each task. Besides significant speed of completion, additional benefits include fine-grained checkpointing and increased flexibility of job submission. "Trickling in" a swarm of individual small tasks tempers competition for CPU time in the shared HPC environment, and jobs submitted during quiet periods can complete in extraordinarily short time frames. The smaller task size also allows the use of older and less powerful hardware. The CDRH workhorse cluster was commissioned in 2010, yet its eight-core CPUs with only 24GB RAM work well in 2017 for these dual segmentation jobs. Finally, these techniques are excitingly friendly to budget conscious scientific research organizations where probabilistic algorithms such as BLAST might discourage attempts at greater certainty because single runs represent a major resource drain. If a job that used to take 24 days can now be completed in less than an hour or on a space available basis (which is the case at CDRH), repeated runs for more exhaustive analyses can be usefully contemplated.
HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.
2017-10-01
PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
The Bioperl Toolkit: Perl Modules for the Life Sciences
Stajich, Jason E.; Block, David; Boulez, Kris; Brenner, Steven E.; Chervitz, Stephen A.; Dagdigian, Chris; Fuellen, Georg; Gilbert, James G.R.; Korf, Ian; Lapp, Hilmar; Lehväslaiho, Heikki; Matsalla, Chad; Mungall, Chris J.; Osborne, Brian I.; Pocock, Matthew R.; Schattner, Peter; Senger, Martin; Stein, Lincoln D.; Stupka, Elia; Wilkinson, Mark D.; Birney, Ewan
2002-01-01
The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort. [Supplemental material is available online at www.genome.org. Bioperl is available as open-source software free of charge and is licensed under the Perl Artistic License (http://www.perl.com/pub/a/language/misc/Artistic.html). It is available for download at http://www.bioperl.org. Support inquiries should be addressed to bioperl-l@bioperl.org.] PMID:12368254
Pan, Feng; You, Jinwei; Liu, Yuan; Qiu, Xuefeng; Yu, Wen; Ma, Jiehua; Pan, Lianjun; Zhang, Aixia; Zhang, Qipeng
2016-12-01
To better understand the molecular aetiology of type 2 diabetes mellitus-associated erectile dysfunction (T2DMED) and to provide candidates for further study of its diagnosis and treatment, this study was designed to investigate differentially expressed microRNAs (miRNAs) in the corpus cavernosum (CC) of mice with T2DMED using GeneChip array techniques (Affymetrix miRNA 4.0 Array) and to predict target genes and signalling pathways regulated by these miRNAs based on bioinformatic analysis using TargetScan, the DAIAN web platform and DAVID. In the initial screening, 21 miRNAs appeared distinctly expressed in the T2DMED group (fold change ≥3, p ≤ 0.01). Among them, the differential expression of miR-18a, miR-206, miR-122, and miR-133 were confirmed by qRT-PCR (p < 0.05 and FDR <5 %). According to bioinformatic analysis, the four miRNAs were speculated to play potential roles in the mechanisms of T2DMED via regulating 28 different genes and several pathways, including apoptosis, fibrosis, eNOS/cGMP/PKG, and vascular smooth muscle contraction processes, which mainly focused on influencing the functions of the endothelium and smooth muscle in the CC. IGF-1, as one of the target genes, was verified to decrease in the CCs of T2DMED animals via ELISA and was confirmed as the target of miR-18a or miR-206 via luciferase assay. Finally, these four miRNAs deserve further confirmation as biomarkers of T2DMED in larger studies. Additionally, miR-18a and/or miR-206 may provide new preventive/therapeutic targets for ED management by targeting IGF-1.
Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J
2013-01-01
Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.
Bioinformatics in High School Biology Curricula: A Study of State Science Standards
ERIC Educational Resources Information Center
Wefer, Stephen H.; Sheppard, Keith
2008-01-01
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics…
ERIC Educational Resources Information Center
Zhang, Xiaorong
2011-01-01
We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…
ERIC Educational Resources Information Center
Vincent, Antony T.; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M.; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J.; Lagüe, Patrick
2018-01-01
A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become…
ERIC Educational Resources Information Center
Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
2014-01-01
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…
XML schemas for common bioinformatic data types and their application in workflow systems
Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert
2006-01-01
Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823
Vignettes: diverse library staff offering diverse bioinformatics services*
Osterbur, David L.; Alpi, Kristine; Canevari, Catharine; Corley, Pamela M.; Devare, Medha; Gaedeke, Nicola; Jacobs, Donna K.; Kirlew, Peter; Ohles, Janet A.; Vaughan, K.T.L.; Wang, Lili; Wu, Yongchun; Geer, Renata C.
2006-01-01
Objectives: The paper gives examples of the bioinformatics services provided in a variety of different libraries by librarians with a broad range of educational background and training. Methods: Two investigators sent an email inquiry to attendees of the “National Center for Biotechnology Information's (NCBI) Introduction to Molecular Biology Information Resources” or “NCBI Advanced Workshop for Bioinformatics Information Specialists (NAWBIS)” courses. The thirty-five-item questionnaire addressed areas such as educational background, library setting, types and numbers of users served, and bioinformatics training and support services provided. Answers were compiled into program vignettes. Discussion: The bioinformatics support services addressed in the paper are based in libraries with academic and clinical settings. Services have been established through different means: in collaboration with biology faculty as part of formal courses, through teaching workshops in the library, through one-on-one consultations, and by other methods. Librarians with backgrounds from art history to doctoral degrees in genetics have worked to establish these programs. Conclusion: Successful bioinformatics support programs can be established in libraries in a variety of different settings and by staff with a variety of different backgrounds and approaches. PMID:16888664
Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D Blaine; Langeland, James A
2009-01-01
Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option. Furthermore, we believe that a true interdisciplinary science experience would be best served by introduction of bioinformatics modules within existing courses in biology and chemistry and other complementary departments. To that end, with support from the Howard Hughes Medical Institute, we have developed over a dozen independent bioinformatics modules for our students that are incorporated into courses ranging from general chemistry and biology, advanced specialty courses, and classes in complementary disciplines such as computer science, mathematics, and physics. These activities have largely promoted active learning in our classrooms and have enhanced student understanding of course materials. Herein, we describe our program, the activities we have developed, and assessment of our endeavors in this area. Copyright © 2009 International Union of Biochemistry and Molecular Biology, Inc.
Generalized Centroid Estimators in Bioinformatics
Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi
2011-01-01
In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017
Development of a cloud-based Bioinformatics Training Platform.
Revote, Jerico; Watson-Haigh, Nathan S; Quenette, Steve; Bethwaite, Blair; McGrath, Annette; Shang, Catherine A
2017-05-01
The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. © The Author 2016. Published by Oxford University Press.
Development of a cloud-based Bioinformatics Training Platform
Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette
2017-01-01
Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333
Design of high-performance parallelized gene predictors in MATLAB.
Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien
2012-04-10
This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius
2016-01-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J
2016-09-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
Bioinformatics in the orphan crops.
Armstead, Ian; Huang, Lin; Ravagnani, Adriana; Robson, Paul; Ougham, Helen
2009-11-01
Orphan crops are those which are grown as food, animal feed or other crops of some importance in agriculture, but which have not yet received the investment of research effort or funding required to develop significant public bioinformatics resources. Where an orphan crop is related to a well-characterised model plant species, comparative genomics and bioinformatics can often, though not always, be exploited to assist research and crop improvement. This review addresses some challenges and opportunities presented by bioinformatics in the orphan crops, using three examples: forage grasses from the genera Lolium and Festuca, forage legumes and the second generation energy crop Miscanthus.
EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow.
Magalhães, Wagner C S; Araujo, Nathalia M; Leal, Thiago P; Araujo, Gilderlanio S; Viriato, Paula J S; Kehdy, Fernanda S; Costa, Gustavo N; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Tarazona-Santos, Eduardo; Rodrigues, Maíra R
2018-06-14
EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs ( info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources. © 2018 Magalhães et al.; Published by Cold Spring Harbor Laboratory Press.
ReGaTE: Registration of Galaxy Tools in Elixir.
Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé
2017-06-01
Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE . © The Author 2017. Published by Oxford University Press.
USDA-ARS?s Scientific Manuscript database
One important mechanism plants use to cope with salinity is keeping the cytosolic Na+ concentration low by sequestering Na+ in vacuoles, a process facilitated by Na+/H+ exchangers (NHX). There are eight NHX genes (NHX1 through NHX8) identified and characterized in Arabidopsis. Bioinformatic analysis...
SoMART, a web server for miRNA, tasiRNA and target gene analysis in Solanaceae plants
USDA-ARS?s Scientific Manuscript database
Plant micro(mi)RNAs and trans-acting small interfering (tasi)RNAs mediate posttranscriptional silencing of genes and play important roles in a variety of biological processes. Although bioinformatics prediction and small (s)RNA cloning are the key approaches used for identification of miRNAs, tasiRN...
Metabolizing Data in the Cloud.
Warth, Benedikt; Levin, Nadine; Rinehart, Duane; Teijaro, John; Benton, H Paul; Siuzdak, Gary
2017-06-01
Cloud-based bioinformatic platforms address the fundamental demands of creating a flexible scientific environment, facilitating data processing and general accessibility independent of a countries' affluence. These platforms have a multitude of advantages as demonstrated by omics technologies, helping to support both government and scientific mandates of a more open environment. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Microarray Tool Provides Pathway and GO Term Analysis.
Koch, Martin; Royer, Hans-Dieter; Wiese, Michael
2011-12-01
Analysis of gene expression profiles is no longer exclusively a task for bioinformatic experts. However, gaining statistically significant results is challenging and requires both biological knowledge and computational know-how. Here we present a novel, user-friendly microarray reporting tool called maRt. The software provides access to bioinformatic resources, like gene ontology terms and biological pathways by use of the DAVID and the BioMart web-service. Results are summarized in structured HTML reports, each presenting a different layer of information. In these report, contents of diverse sources are integrated and interlinked. To speed up processing, maRt takes advantage of the multi-core technology of modern desktop computers by using parallel processing. Since the software is built upon a RCP infrastructure it might be an outset for developers aiming to integrate novel R based applications. Installer, documentation and various kinds of tutorials are available under LGPL license at the website of our institute http://www.pharma.uni-bonn.de/www/mart. This software is free for academic use. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan
2007-03-08
The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.
Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan
2007-01-01
Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562
Unsupervised learning of natural languages
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-01-01
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-08-16
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Long Non-coding RNAs and Their Biological Roles in Plants
Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian
2015-01-01
With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. PMID:25936895
Current challenges in genome annotation through structural biology and bioinformatics.
Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M
2012-10-01
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.
Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong
2013-12-01
To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Cloud-based adaptive exon prediction for DNA analysis
Putluri, Srinivasareddy; Fathima, Shaik Yasmeen
2018-01-01
Cloud computing offers significant research and economic benefits to healthcare organisations. Cloud services provide a safe place for storing and managing large amounts of such sensitive data. Under conventional flow of gene information, gene sequence laboratories send out raw and inferred information via Internet to several sequence libraries. DNA sequencing storage costs will be minimised by use of cloud service. In this study, the authors put forward a novel genomic informatics system using Amazon Cloud Services, where genomic sequence information is stored and accessed for processing. True identification of exon regions in a DNA sequence is a key task in bioinformatics, which helps in disease identification and design drugs. Three base periodicity property of exons forms the basis of all exon identification techniques. Adaptive signal processing techniques found to be promising in comparison with several other methods. Several adaptive exon predictors (AEPs) are developed using variable normalised least mean square and its maximum normalised variants to reduce computational complexity. Finally, performance evaluation of various AEPs is done based on measures such as sensitivity, specificity and precision using various standard genomic datasets taken from National Center for Biotechnology Information genomic sequence database. PMID:29515813
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
Dalpé, Gratien; Joly, Yann
2014-09-01
Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.
Tripathi, Anita; Goswami, Kavita; Sanan-Mishra, Neeti
2015-01-01
microRNAs (miRs) are a class of 21–24 nucleotide long non-coding RNAs responsible for regulating the expression of associated genes mainly by cleavage or translational inhibition of the target transcripts. With this characteristic of silencing, miRs act as an important component in regulation of plant responses in various stress conditions. In recent years, with drastic change in environmental and soil conditions different type of stresses have emerged as a major challenge for plants growth and productivity. The identification and profiling of miRs has itself been a challenge for research workers given their small size and large number of many probable sequences in the genome. Application of computational approaches has expedited the process of identification of miRs and their expression profiling in different conditions. The development of High-Throughput Sequencing (HTS) techniques has facilitated to gain access to the global profiles of the miRs for understanding their mode of action in plants. Introduction of various bioinformatics databases and tools have revolutionized the study of miRs and other small RNAs. This review focuses the role of bioinformatics approaches in the identification and study of the regulatory roles of plant miRs in the adaptive response to stresses. PMID:26578966
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Xia; Liu, Siwen; Chongqing Key Laboratory of Neurobiology, Chongqing Medical University, Chongqing 400016
2015-11-15
Background: Borna disease virus (BDV) is a neurotropic RNA virus persistently infecting mammalian hosts including humans. Lysine acetylation (Kac) is a key protein post-translational modification (PTM). The unexpectedly broad regulatory scope of Kac let us to profile the entire acetylome upon BDV infection. Methods: The acetylome was profiled through stable isotope labeling for cell culture (SILAC)-based quantitative proteomics. The quantifiable proteome was annotated using bioinformatics. Results: We identified and quantified 791 Kac sites in 473 Kac proteins in human BDV Hu-H1-infected and non-infected oligodendroglial (OL) cells. Bioinformatic analysis revealed that BDV infection alters the acetylation of metabolic proteins, membrane-associated proteinsmore » and transmembrane transporter activity, and affects the acetylation of several lysine acetyltransferases (KAT). Conclusions: Upon BDV persistence the OL acetylome is manipulated towards higher energy and transporter levels necessary for shuttling BDV proteins to and from nuclear replication sites. - Highlights: • We used SILAC-based proteomics to analyze the acetylome of BDV infected OL cells. • We quantified 791Kac sites in 473 proteins. • Bioinformatic analysis revealed altered acetylation of metabolic proteins et al. • BDV manipulates the OL acetylome towards higher energy and transporter levels. • BDV infection is associated with enriched phosphate-associated metabolic processes.« less
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas
2014-01-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248
Application of bioinformatics tools and databases in microbial dehalogenation research (a review).
Satpathy, R; Konkimalla, V B; Ratha, J
2015-01-01
Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.
Bourqui, Romain; Benchimol, William; Gaspin, Christine; Sirand-Pugnet, Pascal; Uricaru, Raluca; Dutour, Isabelle
2015-01-01
The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA–mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA–mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks. PMID:25477348
Thébault, Patricia; Bourqui, Romain; Benchimol, William; Gaspin, Christine; Sirand-Pugnet, Pascal; Uricaru, Raluca; Dutour, Isabelle
2015-09-01
The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA-mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA-mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks. © The Author 2014. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalra, Rajkumar S., E-mail: renu-wadhwa@aist.go.jp; Wadhwa, Renu, E-mail: renu-wadhwa@aist.go.jp
2015-02-27
Epithelial membrane antigen (EMA or MUC1) is a heavily glycosylated, type I transmembrane glycoprotein commonly expressed by epithelial cells of duct organs. It has been shown to be aberrantly glycosylated in several diseases including cancer. Protein sequence based annotation and analysis of glycosylation profile of glycoproteins by robust computational and comprehensive algorithms provides possible insights to the mechanism(s) of anomalous glycosylation. In present report, by using a number of bioinformatics applications we studied EMA/MUC1 and explored its trans-membrane structural domain sequence that is widely subjected to glycosylation. Exploration of different extracellular motifs led to prediction of N and O-linked glycosylationmore » target sites. Based on the putative O-linked target sites, glycosylated moieties and pathways were envisaged. Furthermore, Protein network analysis demonstrated physical interaction of EMA with a number of proteins and confirmed its functional involvement in cell growth and proliferation pathways. Gene Ontology analysis suggested an involvement of EMA in a number of functions including signal transduction, protein binding, processing and transport along with glycosylation. Thus, present study explored potential of bioinformatics prediction approach in analyzing glycosylation, co-expression and interaction patterns of EMA/MUC1 glycoprotein.« less
ERIC Educational Resources Information Center
Wightman, Bruce; Hark, Amy T.
2012-01-01
The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this…
Bioinformatics in Middle East Program Curricula--A Focus on the Arabian Gulf
ERIC Educational Resources Information Center
Loucif, Samia
2014-01-01
The purpose of this paper is to investigate the inclusion of bioinformatics in program curricula in the Middle East, focusing on educational institutions in the Arabian Gulf. Bioinformatics is a multidisciplinary field which has emerged in response to the need for efficient data storage and retrieval, and accurate and fast computational and…
The S-Star Trial Bioinformatics Course: An On-line Learning Success
ERIC Educational Resources Information Center
Lim, Yun Ping; Hoog, Jan-Olov; Gardner, Phyllis; Ranganathan, Shoba; Andersson, Siv; Subbiah, Subramanian; Tan, Tin Wee; Hide, Winston; Weiss, Anthony S.
2003-01-01
The S-Star Trial Bioinformatics on-line course (www.s-star.org) is a global experiment in bioinformatics distance education. Six universities from five continents have participated in this project. One hundred and fifty students participated in the first trial course of which 96 followed through the entire course and 70 fulfilled the overall…
NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results.
Lagnel, Jacques; Tsigenopoulos, Costas S; Iliopoulos, Ioannis
2009-03-15
NOBLAST (New Options for BLAST) is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database. JAMBLAST using the NOBLAST output allows the user to manage, view and filter the BLAST hits using a number of selection criteria. A distribution package of NOBLAST and JAMBLAST including detailed installation procedure is freely available from http://sourceforge.net/projects/JAMBLAST/ and http://sourceforge.net/projects/NOBLAST. Supplementary data are available at Bioinformatics online.
Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools
Diaz Acosta, B.
2011-01-01
The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.
Bioinformatics in high school biology curricula: a study of state science standards.
Wefer, Stephen H; Sheppard, Keith
2008-01-01
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.
Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
Cheng, Gong; Zhang, Guocai; Xu, Liang
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C
2018-06-01
There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.
Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.
Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru
2016-09-29
Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.
Schneider, Maria Victoria; Griffin, Philippa C; Tyagi, Sonika; Flannery, Madison; Dayalan, Saravanan; Gladman, Simon; Watson-Haigh, Nathan; Bayer, Philipp E; Charleston, Michael; Cooke, Ira; Cook, Rob; Edwards, Richard J; Edwards, David; Gorse, Dominique; McConville, Malcolm; Powell, David; Wilkins, Marc R; Lonie, Andrew
2017-06-30
EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article. © The Author 2017. Published by Oxford University Press.
Bioinformatics in High School Biology Curricula: A Study of State Science Standards
Sheppard, Keith
2008-01-01
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students. PMID:18316818
Oluwagbemi, Olugbenga O; Adewumi, Adewole; Esuruoso, Abimbola
2012-01-01
Computational biology and bioinformatics are gradually gaining grounds in Africa and other developing nations of the world. However, in these countries, some of the challenges of computational biology and bioinformatics education are inadequate infrastructures, and lack of readily-available complementary and motivational tools to support learning as well as research. This has lowered the morale of many promising undergraduates, postgraduates and researchers from aspiring to undertake future study in these fields. In this paper, we developed and described MACBenAbim (Multi-platform Mobile Application for Computational Biology and Bioinformatics), a flexible user-friendly tool to search for, define and describe the meanings of keyterms in computational biology and bioinformatics, thus expanding the frontiers of knowledge of the users. This tool also has the capability of achieving visualization of results on a mobile multi-platform context. MACBenAbim is available from the authors for non-commercial purposes.
GeneFisher-P: variations of GeneFisher as processes in Bio-jETI
Lamprecht, Anna-Lena; Margaria, Tiziana; Steffen, Bernhard; Sczyrba, Alexander; Hartmeier, Sven; Giegerich, Robert
2008-01-01
Background PCR primer design is an everyday, but not trivial task requiring state-of-the-art software. We describe the popular tool GeneFisher and explain its recent restructuring using workflow techniques. We apply a service-oriented approach to model and implement GeneFisher-P, a process-based version of the GeneFisher web application, as a part of the Bio-jETI platform for service modeling and execution. We show how to introduce a flexible process layer to meet the growing demand for improved user-friendliness and flexibility. Results Within Bio-jETI, we model the process using the jABC framework, a mature model-driven, service-oriented process definition platform. We encapsulate remote legacy tools and integrate web services using jETI, an extension of the jABC for seamless integration of remote resources as basic services, ready to be used in the process. Some of the basic services used by GeneFisher are in fact already provided as individual web services at BiBiServ and can be directly accessed. Others are legacy programs, and are made available to Bio-jETI via the jETI technology. The full power of service-based process orientation is required when more bioinformatics tools, available as web services or via jETI, lead to easy extensions or variations of the basic process. This concerns for instance variations of data retrieval or alignment tools as provided by the European Bioinformatics Institute (EBI). Conclusions The resulting service- and process-oriented GeneFisher-P demonstrates how basic services from heterogeneous sources can be easily orchestrated in the Bio-jETI platform and lead to a flexible family of specialized processes tailored to specific tasks. PMID:18460174
No-boundary thinking in bioinformatics research
2013-01-01
Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339
ERIC Educational Resources Information Center
Barker, Daniel; Alderson, Rosanna G.; McDonagh, James L.; Plaisier, Heleen; Comrie, Muriel M.; Duncan, Leigh; Muirhead, Gavin T. P.; Sweeney, Stuart D.
2015-01-01
Background: Bioinformatics--the use of computers in biology--is of major and increasing importance to biological sciences and medicine. We conducted a preliminary investigation of the value of bringing practical, university-level bioinformatics education to the school level. We conducted voluntary activities for pupils at two schools in Scotland…
The Air Force In Silico -- Computational Biology in 2025
2007-11-01
and chromosome) these new fields are commonly referred to as “~omics.” Proteomics , transcriptomics, metabolomics , epigenomics, physiomics... Bioinformatics , 2006, http://journal.imbio.de/ http://www-bm.ipk-gatersleben.de/stable/php/ journal /articles/pdf/jib-22.pdf (accessed 30 September...Chirino, G. Tansley and I. Dryden, “The implications for Bioinformatics of integration across physical scales,” Journal of Integrative Bioinformatics
Online Tools for Bioinformatics Analyses in Nutrition Sciences12
Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos
2012-01-01
Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844
Biotool2Web: creating simple Web interfaces for bioinformatics applications.
Shahid, Mohammad; Alam, Intikhab; Fuellen, Georg
2006-01-01
Currently there are many bioinformatics applications being developed, but there is no easy way to publish them on the World Wide Web. We have developed a Perl script, called Biotool2Web, which makes the task of creating web interfaces for simple ('home-made') bioinformatics applications quick and easy. Biotool2Web uses an XML document containing the parameters to run the tool on the Web, and generates the corresponding HTML and common gateway interface (CGI) files ready to be published on a web server. This tool is available for download at URL http://www.uni-muenster.de/Bioinformatics/services/biotool2web/ Georg Fuellen (fuellen@alum.mit.edu).
India's Computational Biology Growth and Challenges.
Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy
2016-09-01
India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.
2005-01-01
The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of Gagne's Conditions of Learning instructional design theory. This theory, although first published in the early 1970s, is still fundamental in instructional design and instructional technology. First, top-level as well as prerequisite learning objectives for a microarray analysis workshop and a primer design workshop were defined. Then a hierarchy of objectives for each workshop was created. Hands-on tutorials were designed to meet these objectives. Finally, events of learning proposed by Gagne's theory were incorporated into the hands-on tutorials. The resultant manuals were tested on a small number of trainees, revised, and applied in 1-day bioinformatics workshops. Based on this experience and on observations made during the workshops, we conclude that Gagne's Conditions of Learning instructional design theory provides a useful framework for developing bioinformatics training, but may not be optimal as a method for teaching it. PMID:16220141
XML schemas for common bioinformatic data types and their application in workflow systems.
Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert
2006-11-06
Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Weisman, David
2010-01-01
Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a face-to-face lecture course with a web-based virtual laboratory presents new opportunities for collaborative learning of the conceptual material, and for fostering peer support of technical bioinformatics questions. To explore this combination, an in-person lecture-only undergraduate bioinformatics course was augmented with a remote web-based laboratory, and tested with a large class. This study hypothesized that the collaborative virtual lab would foster active learning and peer support, and tested this hypothesis by conducting a student survey near the end of the semester. Respondents broadly reported strong benefits from the online laboratory, and strong benefits from peer-provided technical support. In comparison with traditional in-person teaching labs, students preferred the virtual lab by a factor of two. Key aspects of the course architecture and design are described to encourage further experimentation in teaching collaborative online bioinformatics laboratories. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.
Revealing biological information using data structuring and automated learning.
Mohorianu, Irina; Moulton, Vincent
2010-11-01
The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.
BioMake: a GNU make-compatible utility for declarative workflow management.
Holmes, Ian H; Mungall, Christopher J
2017-11-01
The Unix 'make' program is widely used in bioinformatics pipelines, but suffers from problems that limit its application to large analysis datasets. These include reliance on file modification times to determine whether a target is stale, lack of support for parallel execution on clusters, and restricted flexibility to extend the underlying logic program. We present BioMake, a make-like utility that is compatible with most features of GNU Make and adds support for popular cluster-based job-queue engines, MD5 signatures as an alternative to timestamps, and logic programming extensions in Prolog. BioMake is available for MacOSX and Linux systems from https://github.com/evoldoers/biomake under the BSD3 license. The only dependency is SWI-Prolog (version 7), available from http://www.swi-prolog.org/. ihholmes + biomake@gmail.com or cmungall + biomake@gmail.com. Feature table comparing BioMake to similar tools. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Community-driven computational biology with Debian Linux
2010-01-01
Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. Results The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Conclusions Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers. PMID:21210984
Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA.
Troshin, Peter V; Procter, James B; Barton, Geoffrey J
2011-07-15
JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
Blau, Ashley; Brown, Alison; Mahanta, Lisa; Amr, Sami S.
2016-01-01
The Translational Genomics Core (TGC) at Partners Personalized Medicine (PPM) serves as a fee-for-service core laboratory for Partners Healthcare researchers, providing access to technology platforms and analysis pipelines for genomic, transcriptomic, and epigenomic research projects. The interaction of the TGC with various components of PPM provides it with a unique infrastructure that allows for greater IT and bioinformatics opportunities, such as sample tracking and data analysis. The following article describes some of the unique opportunities available to an academic research core operating within PPM, such the ability to develop analysis pipelines with a dedicated bioinformatics team and maintain a flexible Laboratory Information Management System (LIMS) with the support of an internal IT team, as well as the operational challenges encountered to respond to emerging technologies, diverse investigator needs, and high staff turnover. In addition, the implementation and operational role of the TGC in the Partners Biobank genotyping project of over 25,000 samples is presented as an example of core activities working with other components of PPM. PMID:26927185
Blau, Ashley; Brown, Alison; Mahanta, Lisa; Amr, Sami S
2016-02-26
The Translational Genomics Core (TGC) at Partners Personalized Medicine (PPM) serves as a fee-for-service core laboratory for Partners Healthcare researchers, providing access to technology platforms and analysis pipelines for genomic, transcriptomic, and epigenomic research projects. The interaction of the TGC with various components of PPM provides it with a unique infrastructure that allows for greater IT and bioinformatics opportunities, such as sample tracking and data analysis. The following article describes some of the unique opportunities available to an academic research core operating within PPM, such the ability to develop analysis pipelines with a dedicated bioinformatics team and maintain a flexible Laboratory Information Management System (LIMS) with the support of an internal IT team, as well as the operational challenges encountered to respond to emerging technologies, diverse investigator needs, and high staff turnover. In addition, the implementation and operational role of the TGC in the Partners Biobank genotyping project of over 25,000 samples is presented as an example of core activities working with other components of PPM.
ESAP plus: a web-based server for EST-SSR marker development.
Ponyared, Piyarat; Ponsawat, Jiradej; Tongsima, Sissades; Seresangtakul, Pusadee; Akkasaeng, Chutipong; Tantisuwichwong, Nathpapat
2016-12-22
Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases. A web-based bioinformatic pipeline, called EST Analysis Pipeline Plus (ESAP Plus), was constructed for assisting researchers to develop SSR markers from a large EST collection. ESAP Plus incorporates several bioinformatic scripts and some useful standard software tools necessary for the four main procedures of EST-SSR marker development, namely 1) pre-processing, 2) clustering and assembly, 3) SSR mining and 4) SSR primer design. The proposed pipeline also provides two alternative steps for reducing EST redundancy and identifying SSR loci. Using public sugarcane ESTs, ESAP Plus automatically executed the aforementioned computational pipeline via a simple web user interface, which was implemented using standard PHP, HTML, CSS and Java scripts. With ESAP Plus, users can upload raw EST data and choose various filtering options and parameters to analyze each of the four main procedures through this web interface. All input EST data and their predicted SSR results will be stored in the ESAP Plus MySQL database. Users will be notified via e-mail when the automatic process is completed and they can download all the results through the web interface. ESAP Plus is a comprehensive and convenient web-based bioinformatic tool for SSR marker development. ESAP Plus offers all necessary EST-SSR development processes with various adjustable options that users can easily use to identify SSR markers from a large EST collection. With familiar web interface, users can upload the raw EST using the data submission page and visualize/download the corresponding EST-SSR information from within ESAP Plus. ESAP Plus can handle considerably large EST datasets. This EST-SSR discovery tool can be accessed directly from: http://gbp.kku.ac.th/esap_plus/ .
Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy
2011-08-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.
Rigbolt, Kristoffer T. G.; Vanselow, Jens T.; Blagoev, Blagoy
2011-01-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)1. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510
Miotto, Riccardo; Glicksberg, Benjamin S.; Morgan, Joseph W.; Dudley, Joel T.
2017-01-01
Monitoring and modeling biomedical, health care and wellness data from individuals and converging data on a population scale have tremendous potential to improve understanding of the transition to the healthy state of human physiology to disease setting. Wellness monitoring devices and companion software applications capable of generating alerts and sharing data with health care providers or social networks are now available. The accessibility and clinical utility of such data for disease or wellness research are currently limited. Designing methods for streaming data capture, real-time data aggregation, machine learning, predictive analytics and visualization solutions to integrate wellness or health monitoring data elements with the electronic medical records (EMRs) maintained by health care providers permits better utilization. Integration of population-scale biomedical, health care and wellness data would help to stratify patients for active health management and to understand clinically asymptomatic patients and underlying illness trajectories. In this article, we discuss various health-monitoring devices, their ability to capture the unique state of health represented in a patient and their application in individualized diagnostics, prognosis, clinical or wellness intervention. We also discuss examples of translational bioinformatics approaches to integrating patient-generated data with existing EMRs, personal health records, patient portals and clinical data repositories. Briefly, translational bioinformatics methods, tools and resources are at the center of these advances in implementing real-time biomedical and health care analytics in the clinical setting. Furthermore, these advances are poised to play a significant role in clinical decision-making and implementation of data-driven medicine and wellness care. PMID:26876889
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oh, J; Deasy, J; Kerns, S
Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation).more » To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysis for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.« less
The Topology Prediction of Membrane Proteins: A Web-Based Tutorial.
Kandemir-Cavas, Cagin; Cavas, Levent; Alyuruk, Hakan
2018-06-01
There is a great need for development of educational materials on the transfer of current bioinformatics knowledge to undergraduate students in bioscience departments. In this study, it is aimed to prepare an example in silico laboratory tutorial on the topology prediction of membrane proteins by bioinformatics tools. This laboratory tutorial is prepared for biochemistry lessons at bioscience departments (biology, chemistry, biochemistry, molecular biology and genetics, and faculty of medicine). The tutorial is intended for students who have not taken a bioinformatics course yet or already have taken a course as an introduction to bioinformatics. The tutorial is based on step-by-step explanations with illustrations. It can be applied under supervision of an instructor in the lessons, or it can be used as a self-study guide by students. In the tutorial, membrane-spanning regions and α-helices of membrane proteins were predicted by internet-based bioinformatics tools. According to the results achieved from internet-based bioinformatics tools, the algorithms and parameters used were effective on the accuracy of prediction. The importance of this laboratory tutorial lies on the facts that it provides an introduction to the bioinformatics and that it also demonstrates an in silico laboratory application to the students at natural sciences. The presented example education material is applicable easily at all departments that have internet connection. This study presents an alternative education material to the students in biochemistry laboratories in addition to classical laboratory experiments.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects
Buerstedde, Jean-Marie; Prill, Florian
2001-01-01
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Bott, O J; Ammenwerth, E; Brigl, B; Knaup, P; Lang, E; Pilgram, R; Pfeifer, B; Ruderich, F; Wolff, A C; Haux, R; Kulikowski, C
2005-01-01
To review recent research efforts in the field of ubiquitous computing in health care. To identify current research trends and further challenges for medical informatics. Analysis of the contents of the Yearbook on Medical Informatics 2005 of the International Medical Informatics Association (IMIA). The Yearbook of Medical Informatics 2005 includes 34 original papers selected from 22 peer-reviewed scientific journals related to several distinct research areas: health and clinical management, patient records, health information systems, medical signal processing and biomedical imaging, decision support, knowledge representation and management, education and consumer informatics as well as bioinformatics. A special section on ubiquitous health care systems is devoted to recent developments in the application of ubiquitous computing in health care. Besides additional synoptical reviews of each of the sections the Yearbook includes invited reviews concerning E-Health strategies, primary care informatics and wearable healthcare. Several publications demonstrate the potential of ubiquitous computing to enhance effectiveness of health services delivery and organization. But ubiquitous computing is also a societal challenge, caused by the surrounding but unobtrusive character of this technology. Contributions from nearly all of the established sub-disciplines of medical informatics are demanded to turn the visions of this promising new research field into reality.
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
BamTools: a C++ API and toolkit for analyzing and managing BAM files
Barnett, Derek W.; Garrison, Erik K.; Quinlan, Aaron R.; Strömberg, Michael P.; Marth, Gabor T.
2011-01-01
Motivation: Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. Results: We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. Availability: BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools. Contact: barnetde@bc.edu PMID:21493652
A System for Information Management in BioMedical Studies—SIMBioMS
Krestyaninova, Maria; Zarins, Andris; Viksna, Juris; Kurbatova, Natalja; Rucevskis, Peteris; Neogi, Sudeshna Guha; Gostev, Mike; Perheentupa, Teemu; Knuuttila, Juha; Barrett, Amy; Lappalainen, Ilkka; Rung, Johan; Podnieks, Karlis; Sarkans, Ugis; McCarthy, Mark I; Brazma, Alvis
2009-01-01
Summary: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented. Availability: The source code, documentation and initialization scripts are available at http://simbioms.org. Contact: support@simbioms.org; mariak@ebi.ac.uk PMID:19633095
Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G
2007-01-01
Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H
2009-01-01
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
PanDA for ATLAS distributed computing in the next decade
NASA Astrophysics Data System (ADS)
Barreiro Megino, F. H.; De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this paper we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.
ERIC Educational Resources Information Center
Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.
2013-01-01
Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…
MeDICi Software Superglue for Data Analysis Pipelines
Ian Gorton
2017-12-09
The Middleware for Data-Intensive Computing (MeDICi) Integration Framework is an integrated middleware platform developed to solve data analysis and processing needs of scientists across many domains. MeDICi is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms, and in use today by PNNL scientists for bioinformatics, power grid failure analysis, and text analysis.
Comparative Modeling of Proteins: A Method for Engaging Students' Interest in Bioinformatics Tools
ERIC Educational Resources Information Center
Badotti, Fernanda; Barbosa, Alan Sales; Reis, André Luiz Martins; do Valle, Ítalo Faria; Ambrósio, Lara; Bitar, Mainá
2014-01-01
The huge increase in data being produced in the genomic era has produced a need to incorporate computers into the research process. Sequence generation, its subsequent storage, interpretation, and analysis are now entirely computer-dependent tasks. Universities from all over the world have been challenged to seek a way of encouraging students to…
From Trace Evidence to Bioinformatics: Putting Bryophytes into Molecular Biology Education
ERIC Educational Resources Information Center
Fuselier, Linda; Bougary, Azhar; Malott, Michelle
2011-01-01
Students benefit most from their science education when they participate fully in the process of science in the context of real-world problems. We describe a student-directed open-inquiry lab experience that has no predetermined outcomes and requires students to engage in all components of scientific inquiry from posing a question through…
ERIC Educational Resources Information Center
Nehm, Ross H.; Budd, Ann F.
2006-01-01
NMITA is a reef coral biodiversity database that we use to introduce students to the expansive realm of bioinformatics beyond genetics. We introduce a series of lessons that have students use this database, thereby accessing real data that can be used to test hypotheses about biodiversity and evolution while targeting the "National Science …
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Tools for visually exploring biological networks.
Suderman, Matthew; Hallett, Michael
2007-10-15
Many tools exist for visually exploring biological networks including well-known examples such as Cytoscape, VisANT, Pathway Studio and Patika. These systems play a key role in the development of integrative biology, systems biology and integrative bioinformatics. The trend in the development of these tools is to go beyond 'static' representations of cellular state, towards a more dynamic model of cellular processes through the incorporation of gene expression data, subcellular localization information and time-dependent behavior. We provide a comprehensive review of the relative advantages and disadvantages of existing systems with two goals in mind: to aid researchers in efficiently identifying the appropriate existing tools for data visualization; to describe the necessary and realistic goals for the next generation of visualization tools. In view of the first goal, we provide in the Supplementary Material a systematic comparison of more than 35 existing tools in terms of over 25 different features. Supplementary data are available at Bioinformatics online.
Long non-coding RNAs and their biological roles in plants.
Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian
2015-06-01
With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Bioinformatics/biostatistics: microarray analysis.
Eichler, Gabriel S
2012-01-01
The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).
Integer Linear Programming in Computational Biology
NASA Astrophysics Data System (ADS)
Althaus, Ernst; Klau, Gunnar W.; Kohlbacher, Oliver; Lenhof, Hans-Peter; Reinert, Knut
Computational molecular biology (bioinformatics) is a young research field that is rich in NP-hard optimization problems. The problem instances encountered are often huge and comprise thousands of variables. Since their introduction into the field of bioinformatics in 1997, integer linear programming (ILP) techniques have been successfully applied to many optimization problems. These approaches have added much momentum to development and progress in related areas. In particular, ILP-based approaches have become a standard optimization technique in bioinformatics. In this review, we present applications of ILP-based techniques developed by members and former members of Kurt Mehlhorn’s group. These techniques were introduced to bioinformatics in a series of papers and popularized by demonstration of their effectiveness and potential.
Bioinformatics projects supporting life-sciences learning in high schools.
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.
The Interactions Between Clinical Informatics and Bioinformatics
Altman, Russ B.
2000-01-01
For the past decade, Stanford Medical Informatics has combined clinical informatics and bioinformatics research and training in an explicit way. The interest in applying informatics techniques to both clinical problems and problems in basic science can be traced to the Dendral project in the 1960s. Having bioinformatics and clinical informatics in the same academic unit is still somewhat unusual and can lead to clashes of clinical and basic science cultures. Nevertheless, the benefits of this organization have recently become clear, as the landscape of academic medicine in the next decades has begun to emerge. The author provides examples of technology transfer between clinical informatics and bioinformatics that illustrate how they complement each other. PMID:10984462
Revote, Jerico; Suchecki, Radosław; Tyagi, Sonika; Corley, Susan M.; Shang, Catherine A.; McGrath, Annette
2017-01-01
Abstract There is a clear demand for hands-on bioinformatics training. The development of bioinformatics workshop content is both time-consuming and expensive. Therefore, enabling trainers to develop bioinformatics workshops in a way that facilitates reuse is becoming increasingly important. The most widespread practice for sharing workshop content is through making PDF, PowerPoint and Word documents available online. While this effort is to be commended, such content is usually not so easy to reuse or repurpose and does not capture all the information required for a third party to rerun a workshop. We present an open, collaborative framework for developing and maintaining, reusable and shareable hands-on training workshop content. PMID:26984618
Web tools for predictive toxicology model building.
Jeliazkova, Nina
2012-07-01
The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.
Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.
Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia
2009-06-16
A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.
Extracting patterns of database and software usage from the bioinformatics literature
Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert
2014-01-01
Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253
Mello, Luciane V; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan
2017-01-01
Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student-staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators' teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability.
Mello, Luciane V.; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan
2017-01-01
Abstract Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student–staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators’ teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability. PMID:29098185
E-Learning as a new tool in bioinformatics teaching
Saravanan, Vijayakumar; Shanmughavel, Piramanayagam
2007-01-01
In recent years, virtual learning is growing rapidly. Universities, colleges, and secondary schools are now delivering training and education over the internet. Beside this, resources available over the WWW are huge and understanding the various techniques employed in the field of Bioinformatics is increasingly complex for students during implementation. Here, we discuss its importance in developing and delivering an educational system in Bioinformatics based on e-learning environment. PMID:18292800
Kovarik, Dina N; Patterson, Davis G; Cohen, Carolyn; Sanders, Elizabeth A; Peterson, Karen A; Porter, Sandra G; Chowning, Jeanne Ting
2013-01-01
We investigated the effects of our Bio-ITEST teacher professional development model and bioinformatics curricula on cognitive traits (awareness, engagement, self-efficacy, and relevance) in high school teachers and students that are known to accompany a developing interest in science, technology, engineering, and mathematics (STEM) careers. The program included best practices in adult education and diverse resources to empower teachers to integrate STEM career information into their classrooms. The introductory unit, Using Bioinformatics: Genetic Testing, uses bioinformatics to teach basic concepts in genetics and molecular biology, and the advanced unit, Using Bioinformatics: Genetic Research, utilizes bioinformatics to study evolution and support student research with DNA barcoding. Pre-post surveys demonstrated significant growth (n = 24) among teachers in their preparation to teach the curricula and infuse career awareness into their classes, and these gains were sustained through the end of the academic year. Introductory unit students (n = 289) showed significant gains in awareness, relevance, and self-efficacy. While these students did not show significant gains in engagement, advanced unit students (n = 41) showed gains in all four cognitive areas. Lessons learned during Bio-ITEST are explored in the context of recommendations for other programs that wish to increase student interest in STEM careers.
Teaching bioinformatics and neuroinformatics by using free web-based tools.
Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson
2010-01-01
This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.
Kovarik, Dina N.; Patterson, Davis G.; Cohen, Carolyn; Sanders, Elizabeth A.; Peterson, Karen A.; Porter, Sandra G.; Chowning, Jeanne Ting
2013-01-01
We investigated the effects of our Bio-ITEST teacher professional development model and bioinformatics curricula on cognitive traits (awareness, engagement, self-efficacy, and relevance) in high school teachers and students that are known to accompany a developing interest in science, technology, engineering, and mathematics (STEM) careers. The program included best practices in adult education and diverse resources to empower teachers to integrate STEM career information into their classrooms. The introductory unit, Using Bioinformatics: Genetic Testing, uses bioinformatics to teach basic concepts in genetics and molecular biology, and the advanced unit, Using Bioinformatics: Genetic Research, utilizes bioinformatics to study evolution and support student research with DNA barcoding. Pre–post surveys demonstrated significant growth (n = 24) among teachers in their preparation to teach the curricula and infuse career awareness into their classes, and these gains were sustained through the end of the academic year. Introductory unit students (n = 289) showed significant gains in awareness, relevance, and self-efficacy. While these students did not show significant gains in engagement, advanced unit students (n = 41) showed gains in all four cognitive areas. Lessons learned during Bio-ITEST are explored in the context of recommendations for other programs that wish to increase student interest in STEM careers. PMID:24006393
NASA Astrophysics Data System (ADS)
Balqis, Widodo, Lukiati, Betty; Amin, Mohamad
2017-05-01
A way to improve the quality of learning in the course of Plant Metabolism in the Department of Biology, State University of Malang, is to develop teaching materials. This research evaluates the needs of bioinformatics-based teaching material in the course Plant Metabolism by the Analyze, Design, Develop, Implement, and Evaluate (ADDIE) development model. Data were collected through questionnaires distributed to the students in the Plant Metabolism course of the Department of Biology, University of Malang, and analysis of the plan of lectures semester (RPS). Learning gains of this course show that it is not yet integrated into the field of bioinformatics. All respondents stated that plant metabolism books do not include bioinformatics and fail to explain the metabolism of a chemical compound of a local plant in Indonesia. Respondents thought that bioinformatics can explain examples and metabolism of a secondary metabolite analysis techniques and discuss potential medicinal compounds from local plants. As many as 65% of the respondents said that the existing metabolism book could not be used to understand secondary metabolism in lectures of plant metabolism. Therefore, the development of teaching materials including plant metabolism-based bioinformatics is important to improve the understanding of the lecture material in plant metabolism.
Dimensionality Reduction in Big Data with Nonnegative Matrix Factorization
2017-06-20
appli- cations of data mining, signal processing , computer vision, bioinformatics, etc. Fun- damentally, NMF has two main purposes. First, it reduces...shape of the function becomes more spherical because ∂ 2g ∂y2i = 1, ∀i, and g(y) is convex. This part aims to make the post- processing parts more...maxStop = 0 for each thread of computation */; 3 /*Re-scaling variables*/; 4 Q = H√ diag(H)diag(H)T ; q = h√ diag(H) ; 5 /*Solving NQP: minimizingf(x
Bioimage informatics for experimental biology
Swedlow, Jason R.; Goldberg, Ilya G.; Eliceiri, Kevin W.
2012-01-01
Over the last twenty years there have been great advances in light microscopy with the result that multi-dimensional imaging has driven a revolution in modern biology. The development of new approaches of data acquisition are reportedly frequently, and yet the significant data management and analysis challenges presented by these new complex datasets remains largely unsolved. Like the well-developed field of genome bioinformatics, central repositories are and will be key resources, but there is a critical need for informatics tools in individual laboratories to help manage, share, visualize, and analyze image data. In this article we present the recent efforts by the bioimage informatics community to tackle these challenges and discuss our own vision for future development of bioimage informatics solution. PMID:19416072
Evolving approaches to the ethical management of genomic data.
McEwen, Jean E; Boyer, Joy T; Sun, Kathie Y
2013-06-01
The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science. Published by Elsevier Ltd.
Evolving Approaches to the Ethical Management of Genomic Data
Boyer, Joy T.; Sun, Kathie Y.
2013-01-01
The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science. PMID:23453621
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Svedas, Vytas
2014-07-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure-function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
2009-01-01
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena
2015-06-01
Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Agonist Binding to Chemosensory Receptors: A Systematic Bioinformatics Analysis
Fierro, Fabrizio; Suku, Eda; Alfonso-Prieto, Mercedes; Giorgetti, Alejandro; Cichon, Sven; Carloni, Paolo
2017-01-01
Human G-protein coupled receptors (hGPCRs) constitute a large and highly pharmaceutically relevant membrane receptor superfamily. About half of the hGPCRs' family members are chemosensory receptors, involved in bitter taste and olfaction, along with a variety of other physiological processes. Hence these receptors constitute promising targets for pharmaceutical intervention. Molecular modeling has been so far the most important tool to get insights on agonist binding and receptor activation. Here we investigate both aspects by bioinformatics-based predictions across all bitter taste and odorant receptors for which site-directed mutagenesis data are available. First, we observe that state-of-the-art homology modeling combined with previously used docking procedures turned out to reproduce only a limited fraction of ligand/receptor interactions inferred by experiments. This is most probably caused by the low sequence identity with available structural templates, which limits the accuracy of the protein model and in particular of the side-chains' orientations. Methods which transcend the limited sampling of the conformational space of docking may improve the predictions. As an example corroborating this, we review here multi-scale simulations from our lab and show that, for the three complexes studied so far, they significantly enhance the predictive power of the computational approach. Second, our bioinformatics analysis provides support to previous claims that several residues, including those at positions 1.50, 2.50, and 7.52, are involved in receptor activation. PMID:28932739
Decision tree and ensemble learning algorithms with their applications in bioinformatics.
Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping
2011-01-01
Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
Bioinformatics Projects Supporting Life-Sciences Learning in High Schools
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B.
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools. PMID:24465192
BioSmalltalk: a pure object system and library for bioinformatics.
Morales, Hernán F; Giovambattista, Guillermo
2013-09-15
We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important for discovering biological knowledge, as is demonstrated by the emergence of open-source programming toolkits for bioinformatics in the past years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving the possibility of scaling to whole-system biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk hernan.morales@gmail.com Supplementary data are available at Bioinformatics online.
2016 update on APBioNet's annual international conference on bioinformatics (InCoB).
Schönbach, Christian; Verma, Chandra; Wee, Lawrence Jin Kiat; Bond, Peter John; Ranganathan, Shoba
2016-12-22
InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.
Samuel A. Cushman
2014-01-01
This is a time of explosive growth in the fields of evolutionary and population genetics, with whole genome sequencing and bioinformatics driving a transformative paradigm shift (Morozova and Marra, 2008). At the same time, advances in epigenetics are thoroughly transforming our understanding of evolutionary processes and their implications for populations, species and...
miRToolsGallery: a tag-based and rankable microRNA bioinformatics resources database portal
Chen, Liang; Heikkinen, Liisa; Wang, ChangLiang; Yang, Yang; Knott, K Emily
2018-01-01
Abstract Hundreds of bioinformatics tools have been developed for MicroRNA (miRNA) investigations including those used for identification, target prediction, structure and expression profile analysis. However, finding the correct tool for a specific application requires the tedious and laborious process of locating, downloading, testing and validating the appropriate tool from a group of nearly a thousand. In order to facilitate this process, we developed a novel database portal named miRToolsGallery. We constructed the portal by manually curating > 950 miRNA analysis tools and resources. In the portal, a query to locate the appropriate tool is expedited by being searchable, filterable and rankable. The ranking feature is vital to quickly identify and prioritize the more useful from the obscure tools. Tools are ranked via different criteria including the PageRank algorithm, date of publication, number of citations, average of votes and number of publications. miRToolsGallery provides links and data for the comprehensive collection of currently available miRNA tools with a ranking function which can be adjusted using different criteria according to specific requirements. Database URL: http://www.mirtoolsgallery.org PMID:29688355
SCALEUS: Semantic Web Services Integration for Biomedical Applications.
Sernadela, Pedro; González-Castro, Lorena; Oliveira, José Luís
2017-04-01
In recent years, we have witnessed an explosion of biological data resulting largely from the demands of life science research. The vast majority of these data are freely available via diverse bioinformatics platforms, including relational databases and conventional keyword search applications. This type of approach has achieved great results in the last few years, but proved to be unfeasible when information needs to be combined or shared among different and scattered sources. During recent years, many of these data distribution challenges have been solved with the adoption of semantic web. Despite the evident benefits of this technology, its adoption introduced new challenges related with the migration process, from existent systems to the semantic level. To facilitate this transition, we have developed Scaleus, a semantic web migration tool that can be deployed on top of traditional systems in order to bring knowledge, inference rules, and query federation to the existent data. Targeted at the biomedical domain, this web-based platform offers, in a single package, straightforward data integration and semantic web services that help developers and researchers in the creation process of new semantically enhanced information systems. SCALEUS is available as open source at http://bioinformatics-ua.github.io/scaleus/ .