WIWS: a protein structure bioinformatics Web service collection.
Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G
2010-07-01
The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.
PREFMD: a web server for protein structure refinement via molecular dynamics simulations.
Heo, Lim; Feig, Michael
2018-03-15
Refinement of protein structure models is a long-standing problem in structural bioinformatics. Molecular dynamics-based methods have emerged as an avenue to achieve consistent refinement. The PREFMD web server implements an optimized protocol based on the method successfully tested in CASP11. Validation with recent CASP refinement targets shows consistent and more significant improvement in global structure accuracy over other state-of-the-art servers. PREFMD is freely available as a web server at http://feiglab.org/prefmd. Scripts for running PREFMD as a stand-alone package are available at https://github.com/feiglab/prefmd.git. feig@msu.edu. Supplementary data are available at Bioinformatics online.
Bioinformatics data distribution and integration via Web Services and XML.
Li, Xiao; Zhang, Yizheng
2003-11-01
It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biology data is not solved satisfactorily. The eXtensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration.
G2S: a web-service for annotating genomic variants on 3D protein structures.
Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong
2018-06-01
Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
ballaxy: web services for structural bioinformatics.
Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas
2015-01-01
Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling. ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
KBWS: an EMBOSS associated package for accessing bioinformatics web services.
Oshita, Kazuki; Arakawa, Kazuharu; Tomita, Masaru
2011-04-29
The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS) UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS), adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded) and http://soap.g-language.org/kbws_dl.wsdl (Document/literal).
Biotool2Web: creating simple Web interfaces for bioinformatics applications.
Shahid, Mohammad; Alam, Intikhab; Fuellen, Georg
2006-01-01
Currently there are many bioinformatics applications being developed, but there is no easy way to publish them on the World Wide Web. We have developed a Perl script, called Biotool2Web, which makes the task of creating web interfaces for simple ('home-made') bioinformatics applications quick and easy. Biotool2Web uses an XML document containing the parameters to run the tool on the Web, and generates the corresponding HTML and common gateway interface (CGI) files ready to be published on a web server. This tool is available for download at URL http://www.uni-muenster.de/Bioinformatics/services/biotool2web/ Georg Fuellen (fuellen@alum.mit.edu).
Zebra: a web server for bioinformatic analysis of diverse protein families.
Suplatov, Dmitry; Kirilin, Evgeny; Takhaveev, Vakil; Svedas, Vytas
2014-01-01
During evolution of proteins from a common ancestor, one functional property can be preserved while others can vary leading to functional diversity. A systematic study of the corresponding adaptive mutations provides a key to one of the most challenging problems of modern structural biology - understanding the impact of amino acid substitutions on protein function. The subfamily-specific positions (SSPs) are conserved within functional subfamilies but are different between them and, therefore, seem to be responsible for functional diversity in protein superfamilies. Consequently, a corresponding method to perform the bioinformatic analysis of sequence and structural data has to be implemented in the common laboratory practice to study the structure-function relationship in proteins and develop novel protein engineering strategies. This paper describes Zebra web server - a powerful remote platform that implements a novel bioinformatic analysis algorithm to study diverse protein families. It is the first application that provides specificity determinants at different levels of functional classification, therefore addressing complex functional diversity of large superfamilies. Statistical analysis is implemented to automatically select a set of highly significant SSPs to be used as hotspots for directed evolution or rational design experiments and analyzed studying the structure-function relationship. Zebra results are provided in two ways - (1) as a single all-in-one parsable text file and (2) as PyMol sessions with structural representation of SSPs. Zebra web server is available at http://biokinet.belozersky.msu.ru/zebra .
d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna
2009-06-01
We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.
A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.
Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis
2012-07-01
The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas
2014-01-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Svedas, Vytas
2014-07-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure-function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis
Rampp, Markus; Soddemann, Thomas; Lederer, Hermann
2006-01-01
We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Zhou, Zhiwei; Xiong, Xin; Zhu, Zheng-Jiang
2017-07-15
In metabolomics, rigorous structural identification of metabolites presents a challenge for bioinformatics. The use of collision cross-section (CCS) values of metabolites derived from ion mobility-mass spectrometry effectively increases the confidence of metabolite identification, but this technique suffers from the limit number of available CCS values. Currently, there is no software available for rapidly generating the metabolites' CCS values. Here, we developed the first web server, namely, MetCCS Predictor, for predicting CCS values. It can predict the CCS values of metabolites using molecular descriptors within a few seconds. Common users with limited background on bioinformatics can benefit from this software and effectively improve the metabolite identification in metabolomics. The web server is freely available at: http://www.metabolomics-shanghai.org/MetCCS/ . jiangzhu@sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource
ERIC Educational Resources Information Center
Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc
2005-01-01
EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…
ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis
Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas
2016-01-01
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475
MAPI: towards the integrated exploitation of bioinformatics Web Services.
Ramirez, Sergio; Karlsson, Johan; Trelles, Oswaldo
2011-10-27
Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.
Velloso, Henrique; Vialle, Ricardo A; Ortega, J Miguel
2015-06-02
Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
2012-01-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
SeWeR: a customizable and integrated dynamic HTML interface to bioinformatics services.
Basu, M K
2001-06-01
Sequence analysis using Web Resources (SeWeR) is an integrated, Dynamic HTML (DHTML) interface to commonly used bioinformatics services available on the World Wide Web. It is highly customizable, extendable, platform neutral, completely server-independent and can be hosted as a web page as well as being used as stand-alone software running within a web browser.
mORCA: sailing bioinformatics world with mobile devices.
Díaz-Del-Pino, Sergio; Falgueras, Juan; Perez-Wohlfeil, Esteban; Trelles, Oswaldo
2018-03-01
Nearly 10 years have passed since the first mobile apps appeared. Given the fact that bioinformatics is a web-based world and that mobile devices are endowed with web-browsers, it seemed natural that bioinformatics would transit from personal computers to mobile devices but nothing could be further from the truth. The transition demands new paradigms, designs and novel implementations. Throughout an in-depth analysis of requirements of existing bioinformatics applications we designed and deployed an easy-to-use web-based lightweight mobile client. Such client is able to browse, select, compose automatically interface parameters, invoke services and monitor the execution of Web Services using the service's metadata stored in catalogs or repositories. mORCA is available at http://bitlab-es.com/morca/app as a web-app. It is also available in the App store by Apple and Play Store by Google. The software will be available for at least 2 years. ortrelles@uma.es. Source code, final web-app, training material and documentation is available at http://bitlab-es.com/morca. © The Author(s) 2017. Published by Oxford University Press.
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.
Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R
2012-07-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis
Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud
2003-01-01
Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565
de la Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; de la Iglesia, Diana; Maojo, Victor
2009-10-07
The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at http://edelman.dia.fi.upm.es/biri/.
Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy
2007-01-01
To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).
Using EMBL-EBI services via Web interface and programmatically via Web Services
Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish
2015-01-01
The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. PMID:25501941
Deineko, Viktor
2006-01-01
Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.
Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S
2007-01-01
Background Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. Results The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. Conclusion The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools. PMID:18021453
Bare, J Christopher; Shannon, Paul T; Schmid, Amy K; Baliga, Nitin S
2007-11-19
Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.
Minie, Mark; Bowers, Stuart; Tarczy-Hornoch, Peter; Roberts, Edward; James, Rose A.; Rambo, Neil; Fuller, Sherrilynne
2006-01-01
Setting: The University of Washington Health Sciences Libraries and Information Center BioCommons serves the bioinformatics needs of researchers at the university and in the vibrant for-profit and not-for-profit biomedical research sector in the Washington area and region. Program Components: The BioCommons comprises services addressing internal University of Washington, not-for-profit, for-profit, and regional and global clientele. The BioCommons is maintained and administered by the BioResearcher Liaison Team. The BioCommons architecture provides a highly flexible structure for adapting to rapidly changing resources and needs. Evaluation Mechanisms: BioCommons uses Web-based pre- and post-course evaluations and periodic user surveys to assess service effectiveness. Recent surveys indicate substantial usage of BioCommons services and a high level of effectiveness and user satisfaction. Next Steps/Future Directions: BioCommons is developing novel collaborative Web resources to distribute bioinformatics tools and is experimenting with Web-based competency training in bioinformation resource use. PMID:16888667
SeqHound: biological sequence and structure database as a platform for bioinformatics research
2002-01-01
Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
What is bioinformatics? A proposed definition and overview of the field.
Luscombe, N M; Greenbaum, D; Gerstein, M
2001-01-01
The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.
Troshin, Peter V; Procter, James B; Sherstnev, Alexander; Barton, Daniel L; Madeira, Fábio; Barton, Geoffrey J
2018-06-01
JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.0, JABAWS 2.2 includes three additional MSA programs (Clustal Omega, MSAprobs, GLprobs), four protein disorder prediction methods (DisEMBL, IUPred, Ronn, GlobPlot), 18 measures of protein conservation as implemented in AACon, and RNA secondary structure prediction by the RNAalifold program. JABAWS 2.2 can be deployed on a variety of in-house or hosted systems. JABAWS 2.2 web services may be accessed from the Jalview multiple sequence analysis workbench (Version 2.8 and later), as well as directly via the JABAWS command line interface (CLI) client. JABAWS 2.2 can be deployed on a local virtual server as a Virtual Appliance (VA) or simply as a Web Application Archive (WAR) for private use. Improvements in JABAWS 2.2 also include simplified installation and a range of utility tools for usage statistics collection, and web services querying and monitoring. The JABAWS CLI client has been updated to support all the new services and allow integration of JABAWS 2.2 services into conventional scripts. A public JABAWS 2 server has been in production since December 2011 and served over 800 000 analyses for users worldwide. JABAWS 2.2 is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. g.j.barton@dundee.ac.uk.
Bringing Web 2.0 to bioinformatics.
Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P
2009-01-01
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Web tools for predictive toxicology model building.
Jeliazkova, Nina
2012-07-01
The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.
Weisman, David
2010-01-01
Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a face-to-face lecture course with a web-based virtual laboratory presents new opportunities for collaborative learning of the conceptual material, and for fostering peer support of technical bioinformatics questions. To explore this combination, an in-person lecture-only undergraduate bioinformatics course was augmented with a remote web-based laboratory, and tested with a large class. This study hypothesized that the collaborative virtual lab would foster active learning and peer support, and tested this hypothesis by conducting a student survey near the end of the semester. Respondents broadly reported strong benefits from the online laboratory, and strong benefits from peer-provided technical support. In comparison with traditional in-person teaching labs, students preferred the virtual lab by a factor of two. Key aspects of the course architecture and design are described to encourage further experimentation in teaching collaborative online bioinformatics laboratories. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.
Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.
Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish
2014-12-12
The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. Copyright © 2014 John Wiley & Sons, Inc.
Reactome diagram viewer: data structures and strategies to boost performance.
Fabregat, Antonio; Sidiropoulos, Konstantinos; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-04-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partitioning data structure minimizes CPU workload, enabling the introduction of new features that further enhance user experience. Through the use of highly optimized data structures and algorithms, Reactome has boosted the performance and usability of the new pathway diagram viewer, providing a robust, scalable and easy-to-integrate solution to pathway visualization. As graph-based visualization of complex data is a frequent challenge in bioinformatics, many of the individual strategies presented here are applicable to a wide range of web-based bioinformatics resources. Reactome is available online at: https://reactome.org. The diagram viewer is part of the Reactome pathway browser (https://reactome.org/PathwayBrowser/) and also available as a stand-alone widget at: https://reactome.org/dev/diagram/. The source code is freely available at: https://github.com/reactome-pwp/diagram. fabregat@ebi.ac.uk or hhe@ebi.ac.uk. Supplementary data are available at Bioinformatics online.
Web-based services for drug design and discovery.
Frey, Jeremy G; Bird, Colin L
2011-09-01
Reviews of the development of drug discovery through the 20(th) century recognised the importance of chemistry and increasingly bioinformatics, but had relatively little to say about the importance of computing and networked computing in particular. However, the design and discovery of new drugs is arguably the most significant single application of bioinformatics and cheminformatics to have benefitted from the increases in the range and power of the computational techniques since the emergence of the World Wide Web, commonly now referred to as simply 'the Web'. Web services have enabled researchers to access shared resources and to deploy standardized calculations in their search for new drugs. This article first considers the fundamental principles of Web services and workflows, and then explores the facilities and resources that have evolved to meet the specific needs of chem- and bio-informatics. This strategy leads to a more detailed examination of the basic components that characterise molecules and the essential predictive techniques, followed by a discussion of the emerging networked services that transcend the basic provisions, and the growing trend towards embracing modern techniques, in particular the Semantic Web. In the opinion of the authors, the issues that require community action are: increasing the amount of chemical data available for open access; validating the data as provided; and developing more efficient links between the worlds of cheminformatics and bioinformatics. The goal is to create ever better drug design services.
Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias
2017-11-10
Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
PDBe: Protein Data Bank in Europe
Velankar, Sameer; Alhroub, Younes; Alili, Anaëlle; Best, Christoph; Boutselakis, Harry C.; Caboche, Ségolène; Conroy, Matthew J.; Dana, Jose M.; van Ginkel, Glen; Golovin, Adel; Gore, Swanand P.; Gutmanas, Aleksandras; Haslam, Pauline; Hirshberg, Miriam; John, Melford; Lagerstedt, Ingvar; Mir, Saqib; Newman, Laurence E.; Oldfield, Tom J.; Penkett, Chris J.; Pineda-Castillo, Jorge; Rinaldi, Luana; Sahni, Gaurav; Sawka, Grégoire; Sen, Sanchayita; Slowley, Robert; Sousa da Silva, Alan Wilter; Suarez-Uruena, Antonio; Swaminathan, G. Jawahar; Symmons, Martyn F.; Vranken, Wim F.; Wainwright, Michael; Kleywegt, Gerard J.
2011-01-01
The Protein Data Bank in Europe (PDBe; pdbe.org) is actively involved in managing the international archive of biomacromolecular structure data as one of the partners in the Worldwide Protein Data Bank (wwPDB; wwpdb.org). PDBe also develops new tools to make structural data more widely and more easily available to the biomedical community. PDBe has developed a browser to access and analyze the structural archive using classification systems that are familiar to chemists and biologists. The PDBe web pages that describe individual PDB entries have been enhanced through the introduction of plain-English summary pages and iconic representations of the contents of an entry (PDBprints). In addition, the information available for structures determined by means of NMR spectroscopy has been expanded. Finally, the entire web site has been redesigned to make it substantially easier to use for expert and novice users alike. PDBe works closely with other teams at the European Bioinformatics Institute (EBI) and in the international scientific community to develop new resources with value-added information. The SIFTS initiative is an example of such a collaboration—it provides extensive mapping data between proteins whose structures are available from the PDB and a host of other biomedical databases. SIFTS is widely used by major bioinformatics resources. PMID:21045060
Jiménez-García, Brian; Pons, Carles; Fernández-Recio, Juan
2013-07-01
pyDockWEB is a web server for the rigid-body docking prediction of protein-protein complex structures using a new version of the pyDock scoring algorithm. We use here a new custom parallel FTDock implementation, with adjusted grid size for optimal FFT calculations, and a new version of pyDock, which dramatically speeds up calculations while keeping the same predictive accuracy. Given the 3D coordinates of two interacting proteins, pyDockWEB returns the best docking orientations as scored mainly by electrostatics and desolvation energy. The server does not require registration by the user and is freely accessible for academics at http://life.bsc.es/servlet/pydock. Supplementary data are available at Bioinformatics online.
jORCA: easily integrating bioinformatics Web Services.
Martín-Requena, Victoria; Ríos, Javier; García, Maximiliano; Ramírez, Sergio; Trelles, Oswaldo
2010-02-15
Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various Web Services differ in their definition and invocation protocols, as well as their communication and data formats-and this presents a barrier to service interoperability. jORCA is a desktop client aimed at facilitating seamless integration of Web Services. It does so by making a uniform representation of the different web resources, supporting scalable service discovery, and automatic composition of workflows. Usability is at the top of the jORCA agenda; thus it is a highly customizable and extensible application that accommodates a broad range of user skills featuring double-click invocation of services in conjunction with advanced execution-control, on the fly data standardization, extensibility of viewer plug-ins, drag-and-drop editing capabilities, plus a file-based browsing style and organization of favourite tools. The integration of bioinformatics Web Services is made easier to support a wider range of users. .
Teaching bioinformatics and neuroinformatics by using free web-based tools.
Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson
2010-01-01
This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.
Scalable web services for the PSIPRED Protein Analysis Workbench.
Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T
2013-07-01
Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.
Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data
2004-12-01
genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS
BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas, Matús; Puntervoll, Pål; Joseph, Alexandre; Bartaseviciūte, Edita; Töpfer, Armin; Venkataraman, Prabakar; Pettifer, Steve; Bryne, Jan Christian; Ison, Jon; Blanchet, Christophe; Rapacki, Kristoffer; Jonassen, Inge
2010-09-15
The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community.
Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course
ERIC Educational Resources Information Center
Weisman, David
2010-01-01
Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…
Teaching Bioinformatics and Neuroinformatics by Using Free Web-Based Tools
ERIC Educational Resources Information Center
Grisham, William; Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson
2010-01-01
This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C
2009-11-12
In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.
Olechnovič, Kliment; Venclovas, Ceslovas
2014-07-01
The Contact Area Difference score (CAD-score) web server provides a universal framework to compute and analyze discrepancies between different 3D structures of the same biological macromolecule or complex. The server accepts both single-subunit and multi-subunit structures and can handle all the major types of macromolecules (proteins, RNA, DNA and their complexes). It can perform numerical comparison of both structures and interfaces. In addition to entire structures and interfaces, the server can assess user-defined subsets. The CAD-score server performs both global and local numerical evaluations of structural differences between structures or interfaces. The results can be explored interactively using sortable tables of global scores, profiles of local errors, superimposed contact maps and 3D structure visualization. The web server could be used for tasks such as comparison of models with the native (reference) structure, comparison of X-ray structures of the same macromolecule obtained in different states (e.g. with and without a bound ligand), analysis of nuclear magnetic resonance (NMR) structural ensemble or structures obtained in the course of molecular dynamics simulation. The web server is freely accessible at: http://www.ibt.lt/bioinformatics/cad-score. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
DSSR-enhanced visualization of nucleic acid structures in Jmol
Hanson, Robert M.
2017-01-01
Abstract Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). PMID:28472503
Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.
Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell
2011-07-26
Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.
COEUS: “semantic web in a box” for biomedical applications
2012-01-01
Background As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. Results COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. Conclusions The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/. PMID:23244467
COEUS: "semantic web in a box" for biomedical applications.
Lopes, Pedro; Oliveira, José Luís
2012-12-17
As the "omics" revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter's complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a "semantic web in a box" approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
ERIC Educational Resources Information Center
Vincent, Antony T.; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M.; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J.; Lagüe, Patrick
2018-01-01
A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become…
BioXSD: the common data-exchange format for everyday bioinformatics web services
Kalaš, Matúš; Puntervoll, Pæl; Joseph, Alexandre; Bartaševičiūtė, Edita; Töpfer, Armin; Venkataraman, Prabakar; Pettifer, Steve; Bryne, Jan Christian; Ison, Jon; Blanchet, Christophe; Rapacki, Kristoffer; Jonassen, Inge
2010-01-01
Motivation: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact: matus.kalas@bccs.uib.no; developers@bioxsd.org; support@bioxsd.org PMID:20823319
The RCSB Protein Data Bank: new resources for research and education
Rose, Peter W.; Bi, Chunxiao; Bluhm, Wolfgang F.; Christie, Cole H.; Dimitropoulos, Dimitris; Dutta, Shuchismita; Green, Rachel K.; Goodsell, David S.; Prlić, Andreas; Quesada, Martha; Quinn, Gregory B.; Ramos, Alexander G.; Westbrook, John D.; Young, Jasmine; Zardecki, Christine; Berman, Helen M.; Bourne, Philip E.
2013-01-01
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data. PMID:23193259
Navigating the changing learning landscape: perspective from bioinformatics.ca
Ouellette, B. F. Francis
2013-01-01
With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update
Huynh, Tien; Rigoutsos, Isidore
2004-01-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification—directly from sequence—of structural deviations from α-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:15215340
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.
Huynh, Tien; Rigoutsos, Isidore
2004-07-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification--directly from sequence--of structural deviations from alpha-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity
Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner
2007-01-01
The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876
SIDECACHE: Information access, management and dissemination framework for web services.
Doderer, Mark S; Burkhardt, Cory; Robbins, Kay A
2011-06-14
Many bioinformatics algorithms and data sets are deployed using web services so that the results can be explored via the Internet and easily integrated into other tools and services. These services often include data from other sites that is accessed either dynamically or through file downloads. Developers of these services face several problems because of the dynamic nature of the information from the upstream services. Many publicly available repositories of bioinformatics data frequently update their information. When such an update occurs, the developers of the downstream service may also need to update. For file downloads, this process is typically performed manually followed by web service restart. Requests for information obtained by dynamic access of upstream sources is sometimes subject to rate restrictions. SideCache provides a framework for deploying web services that integrate information extracted from other databases and from web sources that are periodically updated. This situation occurs frequently in biotechnology where new information is being continuously generated and the latest information is important. SideCache provides several types of services including proxy access and rate control, local caching, and automatic web service updating. We have used the SideCache framework to automate the deployment and updating of a number of bioinformatics web services and tools that extract information from remote primary sources such as NCBI, NCIBI, and Ensembl. The SideCache framework also has been used to share research results through the use of a SideCache derived web service.
Seahawk: moving beyond HTML in Web-based bioinformatics analysis.
Gordon, Paul M K; Sensen, Christoph W
2007-06-18
Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therefore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.
Seahawk: moving beyond HTML in Web-based bioinformatics analysis
Gordon, Paul MK; Sensen, Christoph W
2007-01-01
Background Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therfore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. Results We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. Conclusion As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer. PMID:17577405
KDE Bioscience: platform for bioinformatics analysis workflows.
Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue
2006-08-01
Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
DSSR-enhanced visualization of nucleic acid structures in Jmol.
Hanson, Robert M; Lu, Xiang-Jun
2017-07-03
Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Moby and Moby 2: creatures of the deep (web).
Vandervalk, Ben P; McCarthy, E Luke; Wilkinson, Mark D
2009-03-01
Facile and meaningful integration of data from disparate resources is the 'holy grail' of bioinformatics. Some resources have begun to address this problem by providing their data using Semantic Web standards, specifically the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Unfortunately, adoption of Semantic Web standards has been slow overall, and even in cases where the standards are being utilized, interconnectivity between resources is rare. In response, we have seen the emergence of centralized 'semantic warehouses' that collect public data from third parties, integrate it, translate it into OWL/RDF and provide it to the community as a unified and queryable resource. One limitation of the warehouse approach is that queries are confined to the resources that have been selected for inclusion. A related problem, perhaps of greater concern, is that the majority of bioinformatics data exists in the 'Deep Web'-that is, the data does not exist until an application or analytical tool is invoked, and therefore does not have a predictable Web address. The inability to utilize Uniform Resource Identifiers (URIs) to address this data is a barrier to its accessibility via URI-centric Semantic Web technologies. Here we examine 'The State of the Union' for the adoption of Semantic Web standards in the health care and life sciences domain by key bioinformatics resources, explore the nature and connectivity of several community-driven semantic warehousing projects, and report on our own progress with the CardioSHARE/Moby-2 project, which aims to make the resources of the Deep Web transparently accessible through SPARQL queries.
Stocker, Gernot; Rieder, Dietmar; Trajanoski, Zlatko
2004-03-22
ClusterControl is a web interface to simplify distributing and monitoring bioinformatics applications on Linux cluster systems. We have developed a modular concept that enables integration of command line oriented program into the application framework of ClusterControl. The systems facilitate integration of different applications accessed through one interface and executed on a distributed cluster system. The package is based on freely available technologies like Apache as web server, PHP as server-side scripting language and OpenPBS as queuing system and is available free of charge for academic and non-profit institutions. http://genome.tugraz.at/Software/ClusterControl
TAPIR, a web server for the prediction of plant microRNA targets, including target mimics.
Bonnet, Eric; He, Ying; Billiau, Kenny; Van de Peer, Yves
2010-06-15
We present a new web server called TAPIR, designed for the prediction of plant microRNA targets. The server offers the possibility to search for plant miRNA targets using a fast and a precise algorithm. The precise option is much slower but guarantees to find less perfectly paired miRNA-target duplexes. Furthermore, the precise option allows the prediction of target mimics, which are characterized by a miRNA-target duplex having a large loop, making them undetectable by traditional tools. The TAPIR web server can be accessed at: http://bioinformatics.psb.ugent.be/webtools/tapir. Supplementary data are available at Bioinformatics online.
RNAiFold: a web server for RNA inverse folding and molecular design.
Garcia-Martin, Juan Antonio; Clote, Peter; Dotu, Ivan
2013-07-01
Synthetic biology and nanotechnology are poised to make revolutionary contributions to the 21st century. In this article, we describe a new web server to support in silico RNA molecular design. Given an input target RNA secondary structure, together with optional constraints, such as requiring GC-content to lie within a certain range, requiring the number of strong (GC), weak (AU) and wobble (GU) base pairs to lie in a certain range, the RNAiFold web server determines one or more RNA sequences, whose minimum free-energy secondary structure is the target structure. RNAiFold provides access to two servers: RNA-CPdesign, which applies constraint programming, and RNA-LNSdesign, which applies the large neighborhood search heuristic; hence, it is suitable for larger input structures. Both servers can also solve the RNA inverse hybridization problem, i.e. given a representation of the desired hybridization structure, RNAiFold returns two sequences, whose minimum free-energy hybridization is the input target structure. The web server is publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold, which provides access to two specialized servers: RNA-CPdesign and RNA-LNSdesign. Source code for the underlying algorithms, implemented in COMET and supported on linux, can be downloaded at the server website.
Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin
2017-10-30
Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.
Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A
2005-04-07
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz
2009-08-25
Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.
Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz
2009-01-01
Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156
RNAmutants: a web server to explore the mutational landscape of RNA secondary structures
Waldispühl, Jerome; Devadas, Srinivas; Berger, Bonnie; Clote, Peter
2009-01-01
The history and mechanism of molecular evolution in DNA have been greatly elucidated by contributions from genetics, probability theory and bioinformatics—indeed, mathematical developments such as Kimura's neutral theory, Kingman's coalescent theory and efficient software such as BLAST, ClustalW, Phylip, etc., provide the foundation for modern population genetics. In contrast to DNA, the function of most noncoding RNA depends on tertiary structure, experimentally known to be largely determined by secondary structure, for which dynamic programming can efficiently compute the minimum free energy secondary structure. For this reason, understanding the effect of pointwise mutations in RNA secondary structure could reveal fundamental properties of structural RNA molecules and improve our understanding of molecular evolution of RNA. The web server RNAmutants provides several efficient tools to compute the ensemble of low-energy secondary structures for all k-mutants of a given RNA sequence, where k is bounded by a user-specified upper bound. As we have previously shown, these tools can be used to predict putative deleterious mutations and to analyze regulatory sequences from the hepatitis C and human immunodeficiency genomes. Web server is available at http://bioinformatics.bc.edu/clotelab/RNAmutants/, and downloadable binaries at http://rnamutants.csail.mit.edu/. PMID:19531740
ExPASy: SIB bioinformatics resource portal.
Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz
2012-07-01
ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.
SeMPI: a genome-based secondary metabolite prediction and identification web server.
Zierep, Paul F; Padilla, Natàlia; Yonchev, Dimitar G; Telukunta, Kiran K; Klementz, Dennis; Günther, Stefan
2017-07-03
The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not been identified yet. Even though genome mining tools have become significantly more efficient in the identification of biosynthetic gene clusters, structural elucidation of the actual secondary metabolite is still challenging, especially due to as yet unpredictable post-modifications. Here, we introduce SeMPI, a web server providing a prediction and identification pipeline for natural products synthesized by polyketide synthases of type I modular. In order to limit the possible structures of PKS products and to include putative tailoring reactions, a structural comparison with annotated natural products was introduced. Furthermore, a benchmark was designed based on 40 gene clusters with annotated PKS products. The web server of the pipeline (SeMPI) is freely available at: http://www.pharmaceutical-bioinformatics.de/sempi. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
S3DB core: a framework for RDF generation and management in bioinformatics infrastructures
2010-01-01
Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.
Tong, Jing; Pei, Jimin; Grishin, Nick V
2015-09-03
Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Automatically exposing OpenLifeData via SADI semantic Web Services.
González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D
2014-01-01
Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich
2017-04-16
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.
Tao, Yuan; Liu, Juan
2005-01-01
The Internet has already deflated our world of working and living into a very small scope, thus bringing out the concept of Earth Village, in which people could communicate and co-work though thousands' miles far away from each other. This paper describes a prototype, which is just like an Earth Lab for bioinformatics, based on Web services framework to build up a network architecture for bioinformatics research and for world wide biologists to easily implement enormous, complex processes, and effectively share and access computing resources and data, regardless of how heterogeneous the format of the data is and how decentralized and distributed these resources are around the world. A diminutive and simplified example scenario is given out to realize the prototype after that.
The RCSB protein data bank: integrative view of protein, gene and 3D structural information
Rose, Peter W.; Prlić, Andreas; Altunkaya, Ali; Bi, Chunxiao; Bradley, Anthony R.; Christie, Cole H.; Costanzo, Luigi Di; Duarte, Jose M.; Dutta, Shuchismita; Feng, Zukang; Green, Rachel Kramer; Goodsell, David S.; Hudson, Brian; Kalro, Tara; Lowe, Robert; Peisach, Ezra; Randle, Christopher; Rose, Alexander S.; Shao, Chenghua; Tao, Yi-Ping; Valasatava, Yana; Voigt, Maria; Westbrook, John D.; Woo, Jesse; Yang, Huangwang; Young, Jasmine Y.; Zardecki, Christine; Berman, Helen M.; Burley, Stephen K.
2017-01-01
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a ‘Structural View of Biology.’ Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive. PMID:27794042
MEGANTE: A Web-Based System for Integrated Plant Genome Annotation
Numa, Hisataka; Itoh, Takeshi
2014-01-01
The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915
2012-01-01
Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941
El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter
2012-01-01
Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
PDBe: towards reusable data delivery infrastructure at protein data bank in Europe
Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J
2018-01-01
Abstract The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API. PMID:29126160
Biowep: a workflow enactment portal for bioinformatics applications.
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-03-08
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-01-01
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
USDA-ARS?s Scientific Manuscript database
Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS fi...
The new protein topology graph library web server.
Schäfer, Tim; Scheck, Andreas; Bruneß, Daniel; May, Patrick; Koch, Ina
2016-02-01
We present a new, extended version of the Protein Topology Graph Library web server. The Protein Topology Graph Library describes the protein topology on the super-secondary structure level. It allows to compute and visualize protein ligand graphs and search for protein structural motifs. The new server features additional information on ligand binding to secondary structure elements, increased usability and an application programming interface (API) to retrieve data, allowing for an automated analysis of protein topology. The Protein Topology Graph Library server is freely available on the web at http://ptgl.uni-frankfurt.de. The website is implemented in PHP, JavaScript, PostgreSQL and Apache. It is supported by all major browsers. The VPLG software that was used to compute the protein ligand graphs and all other data in the database is available under the GNU public license 2.0 from http://vplg.sourceforge.net. tim.schaefer@bioinformatik.uni-frankfurt.de; ina.koch@bioinformatik.uni-frankfurt.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick; Lo, Chien-Chi; Li, Po-E
EDGE bioinformatics was developed to help biologists process Next Generation Sequencing data (in the form of raw FASTQ files), even if they have little to no bioinformatics expertise. EDGE is a highly integrated and interactive web-based platform that is capable of running many of the standard analyses that biologists require for viral, bacterial/archaeal, and metagenomic samples. EDGE provides the following analytical workflows: quality trimming and host removal, assembly and annotation, comparisons against known references, taxonomy classification of reads and contigs, whole genome SNP-based phylogenetic analysis, and PCR analysis. EDGE provides an intuitive web-based interface for user input, allows users tomore » visualize and interact with selected results (e.g. JBrowse genome browser), and generates a final detailed PDF report. Results in the form of tables, text files, graphic files, and PDFs can be downloaded. A user management system allows tracking of an individual’s EDGE runs, along with the ability to share, post publicly, delete, or archive their results.« less
BioCatalogue: a universal catalogue of web services for the life sciences
Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A.
2010-01-01
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community. PMID:20484378
BioCatalogue: a universal catalogue of web services for the life sciences.
Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A
2010-07-01
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.
BEAM web server: a tool for structural RNA motif discovery.
Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2018-03-15
RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
USDA-ARS?s Scientific Manuscript database
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficu...
Learning Genetics through an Authentic Research Simulation in Bioinformatics
ERIC Educational Resources Information Center
Gelbart, Hadas; Yarden, Anat
2006-01-01
Following the rationale that learning is an active process of knowledge construction as well as enculturation into a community of experts, we developed a novel web-based learning environment in bioinformatics for high-school biology majors in Israel. The learning environment enables the learners to actively participate in a guided inquiry process…
Glossary of bioinformatics terms.
2007-06-01
This collection of terms and definitions commonly encountered in the bioinformatics literature will be updated periodically as Current Protocols in Bioinformatics grows. In addition, an extensive glossary of genetic terms can be found on the Web site of the National Human Genome Research Institute (http://www.genome.gov/glossary.cfm). The entries in that online glossary provide a brief written definition of the term; the user can also listen to an informative explanation of the term using RealAudio or the Windows Media Player.
POOL server: machine learning application for functional site prediction in proteins.
Somarowthu, Srinivas; Ondrechen, Mary Jo
2012-08-01
We present an automated web server for partial order optimum likelihood (POOL), a machine learning application that combines computed electrostatic and geometric information for high-performance prediction of catalytic residues from 3D structures. Input features consist of THEMATICS electrostatics data and pocket information from ConCavity. THEMATICS measures deviation from typical, sigmoidal titration behavior to identify functionally important residues and ConCavity identifies binding pockets by analyzing the surface geometry of protein structures. Both THEMATICS and ConCavity (structure only) do not require the query protein to have any sequence or structure similarity to other proteins. Hence, POOL is applicable to proteins with novel folds and engineered proteins. As an additional option for cases where sequence homologues are available, users can include evolutionary information from INTREPID for enhanced accuracy in site prediction. The web site is free and open to all users with no login requirements at http://www.pool.neu.edu. m.ondrechen@neu.edu Supplementary data are available at Bioinformatics online.
KISS for STRAP: user extensions for a protein alignment editor.
Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius
2003-12-12
The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/
Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru; Ono, Keiichiro; Aoki-Kinoshita, Kiyoko F; Yamamoto, Yasunori; Yamaguchi, Atsuko; Kawashima, Shuichi; Chun, Hong-Woo; Aerts, Jan; Aranda, Bruno; Barboza, Lord Hendrix; Bonnal, Raoul Jp; Bruskiewich, Richard; Bryne, Jan C; Fernández, José M; Funahashi, Akira; Gordon, Paul Mk; Goto, Naohisa; Groscurth, Andreas; Gutteridge, Alex; Holland, Richard; Kano, Yoshinobu; Kawas, Edward A; Kerhornou, Arnaud; Kibukawa, Eri; Kinjo, Akira R; Kuhn, Michael; Lapp, Hilmar; Lehvaslaiho, Heikki; Nakamura, Hiroyuki; Nakamura, Yasukazu; Nishizawa, Tatsuya; Nobata, Chikashi; Noguchi, Tamotsu; Oinn, Thomas M; Okamoto, Shinobu; Owen, Stuart; Pafilis, Evangelos; Pocock, Matthew; Prins, Pjotr; Ranzinger, René; Reisinger, Florian; Salwinski, Lukasz; Schreiber, Mark; Senger, Martin; Shigemoto, Yasumasa; Standley, Daron M; Sugawara, Hideaki; Tashiro, Toshiyuki; Trelles, Oswaldo; Vos, Rutger A; Wilkinson, Mark D; York, William; Zmasek, Christian M; Asai, Kiyoshi; Takagi, Toshihisa
2010-08-21
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
2010-01-01
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies. PMID:20727200
Senter, Evan; Sheikh, Saad; Dotu, Ivan; Ponty, Yann; Clote, Peter
2012-01-01
Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/. PMID:23284639
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.
Hallin, Peter F; Ussery, David W
2004-12-12
Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
2011-01-01
Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework. PMID:21806842
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.
Katayama, Toshiaki; Wilkinson, Mark D; Vos, Rutger; Kawashima, Takeshi; Kawashima, Shuichi; Nakao, Mitsuteru; Yamamoto, Yasunori; Chun, Hong-Woo; Yamaguchi, Atsuko; Kawano, Shin; Aerts, Jan; Aoki-Kinoshita, Kiyoko F; Arakawa, Kazuharu; Aranda, Bruno; Bonnal, Raoul Jp; Fernández, José M; Fujisawa, Takatomo; Gordon, Paul Mk; Goto, Naohisa; Haider, Syed; Harris, Todd; Hatakeyama, Takashi; Ho, Isaac; Itoh, Masumi; Kasprzyk, Arek; Kido, Nobuhiro; Kim, Young-Joo; Kinjo, Akira R; Konishi, Fumikazu; Kovarskaya, Yulia; von Kuster, Greg; Labarga, Alberto; Limviphuvadh, Vachiranee; McCarthy, Luke; Nakamura, Yasukazu; Nam, Yunsun; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Oinn, Tom; Okamoto, Shinobu; Okuda, Shujiro; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Putnam, Nicholas; Senger, Martin; Severin, Jessica; Shigemoto, Yasumasa; Sugawara, Hideaki; Taylor, James; Trelles, Oswaldo; Yamasaki, Chisato; Yamashita, Riu; Satoh, Noriyuki; Takagi, Toshihisa
2011-08-02
The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
ERIC Educational Resources Information Center
Holtzclaw, J. David; Eisen, Arri; Whitney, Erika M.; Penumetcha, Meera; Hoey, J. Joseph; Kimbro, K. Sean
2006-01-01
Many students at minority-serving institutions are underexposed to Internet resources such as the human genome project, PubMed, NCBI databases, and other Web-based technologies because of a lack of financial resources. To change this, we designed and implemented a new bioinformatics component to supplement the undergraduate Genetics course at…
Web scraping technologies in an API world.
Glez-Peña, Daniel; Lourenço, Anália; López-Fernández, Hugo; Reboiro-Jato, Miguel; Fdez-Riverola, Florentino
2014-09-01
Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
PLI: a web-based tool for the comparison of protein-ligand interactions observed on PDB structures.
Gallina, Anna Maria; Bisignano, Paola; Bergamino, Maurizio; Bordo, Domenico
2013-02-01
A large fraction of the entries contained in the Protein Data Bank describe proteins in complex with low molecular weight molecules such as physiological compounds or synthetic drugs. In many cases, the same molecule is found in distinct protein-ligand complexes. There is an increasing interest in Medicinal Chemistry in comparing protein binding sites to get insight on interactions that modulate the binding specificity, as this structural information can be correlated with other experimental data of biochemical or physiological nature and may help in rational drug design. The web service protein-ligand interaction presented here provides a tool to analyse and compare the binding pockets of homologous proteins in complex with a selected ligand. The information is deduced from protein-ligand complexes present in the Protein Data Bank and stored in the underlying database. Freely accessible at http://bioinformatics.istge.it/pli/.
PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.
Mezulis, Stefans; Sternberg, Michael J E; Kelley, Lawrence A
2016-02-22
The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score>0.7) structures in the PDB for each query structure, usually in less than 60s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Chen, Junjie; Guo, Mingyue; Li, Shumin; Liu, Bin
2017-11-01
As one of the most important tasks in protein sequence analysis, protein remote homology detection is critical for both basic research and practical applications. Here, we present an effective web server for protein remote homology detection called ProtDec-LTR2.0 by combining ProtDec-Learning to Rank (LTR) and pseudo protein representation. Experimental results showed that the detection performance is obviously improved. The web server provides a user-friendly interface to explore the sequence and structure information of candidate proteins and find their conserved domains by launching a multiple sequence alignment tool. The web server is free and open to all users with no login requirement at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR2.0/. bliu@hit.edu.cn. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
3D-SURFER: software for high-throughput protein surface comparison and analysis
La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke
2009-01-01
Summary: We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. Availability: 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19759195
3D-SURFER: software for high-throughput protein surface comparison and analysis.
La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke
2009-11-01
We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer dkihara@purdue.edu Supplementary data are available at Bioinformatics online.
A Web-based assessment of bioinformatics end-user support services at US universities.
Messersmith, Donna J; Benson, Dennis A; Geer, Renata C
2006-07-01
This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.
GFFview: A Web Server for Parsing and Visualizing Annotation Information of Eukaryotic Genome.
Deng, Feilong; Chen, Shi-Yi; Wu, Zhou-Lin; Hu, Yongsong; Jia, Xianbo; Lai, Song-Jia
2017-10-01
Owing to wide application of RNA sequencing (RNA-seq) technology, more and more eukaryotic genomes have been extensively annotated, such as the gene structure, alternative splicing, and noncoding loci. Annotation information of genome is prevalently stored as plain text in General Feature Format (GFF), which could be hundreds or thousands Mb in size. Therefore, it is a challenge for manipulating GFF file for biologists who have no bioinformatic skill. In this study, we provide a web server (GFFview) for parsing the annotation information of eukaryotic genome and then generating statistical description of six indices for visualization. GFFview is very useful for investigating quality and difference of the de novo assembled transcriptome in RNA-seq studies.
Schäuble, Sascha; Stavrum, Anne-Kristin; Bockwoldt, Mathias; Puntervoll, Pål; Heiland, Ines
2017-06-24
Systems Biology Markup Language (SBML) is the standard model representation and description language in systems biology. Enriching and analysing systems biology models by integrating the multitude of available data, increases the predictive power of these models. This may be a daunting task, which commonly requires bioinformatic competence and scripting. We present SBMLmod, a Python-based web application and service, that automates integration of high throughput data into SBML models. Subsequent steady state analysis is readily accessible via the web service COPASIWS. We illustrate the utility of SBMLmod by integrating gene expression data from different healthy tissues as well as from a cancer dataset into a previously published model of mammalian tryptophan metabolism. SBMLmod is a user-friendly platform for model modification and simulation. The web application is available at http://sbmlmod.uit.no , whereas the WSDL definition file for the web service is accessible via http://sbmlmod.uit.no/SBMLmod.wsdl . Furthermore, the entire package can be downloaded from https://github.com/MolecularBioinformatics/sbml-mod-ws . We envision that SBMLmod will make automated model modification and simulation available to a broader research community.
Cheng, Liang; Hu, Yang; Sun, Jie; Zhou, Meng; Jiang, Qinghua
2018-06-01
DincRNA aims to provide a comprehensive web-based bioinformatics toolkit to elucidate the entangled relationships among diseases and non-coding RNAs (ncRNAs) from the perspective of disease similarity. The quantitative way to illustrate relationships of pair-wise diseases always depends on their molecular mechanisms, and structures of the directed acyclic graph of Disease Ontology (DO). Corresponding methods for calculating similarity of pair-wise diseases involve Resnik's, Lin's, Wang's, PSB and SemFunSim methods. Recently, disease similarity was validated suitable for calculating functional similarities of ncRNAs and prioritizing ncRNA-disease pairs, and it has been widely applied for predicting the ncRNA function due to the limited biological knowledge from wet lab experiments of these RNAs. For this purpose, a large number of algorithms and priori knowledge need to be integrated. e.g. 'pair-wise best, pairs-average' (PBPA) and 'pair-wise all, pairs-maximum' (PAPM) methods for calculating functional similarities of ncRNAs, and random walk with restart (RWR) method for prioritizing ncRNA-disease pairs. To facilitate the exploration of disease associations and ncRNA function, DincRNA implemented all of the above eight algorithms based on DO and disease-related genes. Currently, it provides the function to query disease similarity scores, miRNA and lncRNA functional similarity scores, and the prioritization scores of lncRNA-disease and miRNA-disease pairs. http://bio-annotation.cn:18080/DincRNAClient/. biofomeng@hotmail.com or qhjiang@hit.edu.cn. Supplementary data are available at Bioinformatics online.
Bioinformatics projects supporting life-sciences learning in high schools.
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.
PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.
Mir, Saqib; Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Deshpande, Mandar; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J; Velankar, Sameer
2018-01-04
The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Better bioinformatics through usability analysis.
Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia
2009-02-01
Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.
Machiela, Mitchell J; Chanock, Stephen J
2015-11-01
Assessing linkage disequilibrium (LD) across ancestral populations is a powerful approach for investigating population-specific genetic structure as well as functionally mapping regions of disease susceptibility. Here, we present LDlink, a web-based collection of bioinformatic modules that query single nucleotide polymorphisms (SNPs) in population groups of interest to generate haplotype tables and interactive plots. Modules are designed with an emphasis on ease of use, query flexibility, and interactive visualization of results. Phase 3 haplotype data from the 1000 Genomes Project are referenced for calculating pairwise metrics of LD, searching for proxies in high LD, and enumerating all observed haplotypes. LDlink is tailored for investigators interested in mapping common and uncommon disease susceptibility loci by focusing on output linking correlated alleles and highlighting putative functional variants. LDlink is a free and publically available web tool which can be accessed at http://analysistools.nci.nih.gov/LDlink/. mitchell.machiela@nih.gov. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.
Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory
2016-06-13
Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.
LDAP: a web server for lncRNA-disease association prediction.
Lan, Wei; Li, Min; Zhao, Kaijie; Liu, Jin; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin
2017-02-01
Increasing evidences have demonstrated that long noncoding RNAs (lncRNAs) play important roles in many human diseases. Therefore, predicting novel lncRNA-disease associations would contribute to dissect the complex mechanisms of disease pathogenesis. Some computational methods have been developed to infer lncRNA-disease associations. However, most of these methods infer lncRNA-disease associations only based on single data resource. In this paper, we propose a new computational method to predict lncRNA-disease associations by integrating multiple biological data resources. Then, we implement this method as a web server for lncRNA-disease association prediction (LDAP). The input of the LDAP server is the lncRNA sequence. The LDAP predicts potential lncRNA-disease associations by using a bagging SVM classifier based on lncRNA similarity and disease similarity. The web server is available at http://bioinformatics.csu.edu.cn/ldap jxwang@mail.csu.edu.cn. Supplementary data are available at Bioinformatics online.
EVALLER: a web server for in silico assessment of potential protein allergenicity
Barrio, Alvaro Martinez; Soeria-Atmadja, Daniel; Nistér, Anders; Gustafsson, Mats G.; Hammerling, Ulf; Bongcam-Rudloff, Erik
2007-01-01
Bioinformatics testing approaches for protein allergenicity, involving amino acid sequence comparisons, have evolved appreciably over the last several years to increased sophistication and performance. EVALLER, the web server presented in this article is based on our recently published ‘Detection based on Filtered Length-adjusted Allergen Peptides’ (DFLAP) algorithm, which affords in silico determination of potential protein allergenicity of high sensitivity and excellent specificity. To strengthen bioinformatics risk assessment in allergology EVALLER provides a comprehensive outline of its judgment on a query protein's potential allergenicity. Each such textual output incorporates a scoring figure, a confidence numeral of the assignment and information on high- or low-scoring matches to identified allergen-related motifs, including their respective location in accordingly derived allergens. The interface, built on a modified Perl Open Source package, enables dynamic and color-coded graphic representation of key parts of the output. Moreover, pertinent details can be examined in great detail through zoomed views. The server can be accessed at http://bioinformatics.bmc.uu.se/evaller.html. PMID:17537818
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick
Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.
Bioinformatics Projects Supporting Life-Sciences Learning in High Schools
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B.
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools. PMID:24465192
MOWServ: a web client for integration of bioinformatic resources
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo
2010-01-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794
MOWServ: a web client for integration of bioinformatic resources.
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo
2010-07-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas
2018-04-01
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
mORCA: ubiquitous access to life science web services.
Diaz-Del-Pino, Sergio; Trelles, Oswaldo; Falgueras, Juan
2018-01-16
Technical advances in mobile devices such as smartphones and tablets have produced an extraordinary increase in their use around the world and have become part of our daily lives. The possibility of carrying these devices in a pocket, particularly mobile phones, has enabled ubiquitous access to Internet resources. Furthermore, in the life sciences world there has been a vast proliferation of data types and services that finish as Web Services. This suggests the need for research into mobile clients to deal with life sciences applications for effective usage and exploitation. Analysing the current features in existing bioinformatics applications managing Web Services, we have devised, implemented, and deployed an easy-to-use web-based lightweight mobile client. This client is able to browse, select, compose parameters, invoke, and monitor the execution of Web Services stored in catalogues or central repositories. The client is also able to deal with huge amounts of data between external storage mounts. In addition, we also present a validation use case, which illustrates the usage of the application while executing, monitoring, and exploring the results of a registered workflow. The software its available in the Apple Store and Android Market and the source code is publicly available in Github. Mobile devices are becoming increasingly important in the scientific world due to their strong potential impact on scientific applications. Bioinformatics should not fall behind this trend. We present an original software client that deals with the intrinsic limitations of such devices and propose different guidelines to provide location-independent access to computational resources in bioinformatics and biomedicine. Its modular design makes it easily expandable with the inclusion of new repositories, tools, types of visualization, etc.
A Web-based assessment of bioinformatics end-user support services at US universities
Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.
2006-01-01
Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663
SCALEUS: Semantic Web Services Integration for Biomedical Applications.
Sernadela, Pedro; González-Castro, Lorena; Oliveira, José Luís
2017-04-01
In recent years, we have witnessed an explosion of biological data resulting largely from the demands of life science research. The vast majority of these data are freely available via diverse bioinformatics platforms, including relational databases and conventional keyword search applications. This type of approach has achieved great results in the last few years, but proved to be unfeasible when information needs to be combined or shared among different and scattered sources. During recent years, many of these data distribution challenges have been solved with the adoption of semantic web. Despite the evident benefits of this technology, its adoption introduced new challenges related with the migration process, from existent systems to the semantic level. To facilitate this transition, we have developed Scaleus, a semantic web migration tool that can be deployed on top of traditional systems in order to bring knowledge, inference rules, and query federation to the existent data. Targeted at the biomedical domain, this web-based platform offers, in a single package, straightforward data integration and semantic web services that help developers and researchers in the creation process of new semantically enhanced information systems. SCALEUS is available as open source at http://bioinformatics-ua.github.io/scaleus/ .
Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools
Diaz Acosta, B.
2011-01-01
The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.
ESAP plus: a web-based server for EST-SSR marker development.
Ponyared, Piyarat; Ponsawat, Jiradej; Tongsima, Sissades; Seresangtakul, Pusadee; Akkasaeng, Chutipong; Tantisuwichwong, Nathpapat
2016-12-22
Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases. A web-based bioinformatic pipeline, called EST Analysis Pipeline Plus (ESAP Plus), was constructed for assisting researchers to develop SSR markers from a large EST collection. ESAP Plus incorporates several bioinformatic scripts and some useful standard software tools necessary for the four main procedures of EST-SSR marker development, namely 1) pre-processing, 2) clustering and assembly, 3) SSR mining and 4) SSR primer design. The proposed pipeline also provides two alternative steps for reducing EST redundancy and identifying SSR loci. Using public sugarcane ESTs, ESAP Plus automatically executed the aforementioned computational pipeline via a simple web user interface, which was implemented using standard PHP, HTML, CSS and Java scripts. With ESAP Plus, users can upload raw EST data and choose various filtering options and parameters to analyze each of the four main procedures through this web interface. All input EST data and their predicted SSR results will be stored in the ESAP Plus MySQL database. Users will be notified via e-mail when the automatic process is completed and they can download all the results through the web interface. ESAP Plus is a comprehensive and convenient web-based bioinformatic tool for SSR marker development. ESAP Plus offers all necessary EST-SSR development processes with various adjustable options that users can easily use to identify SSR markers from a large EST collection. With familiar web interface, users can upload the raw EST using the data submission page and visualize/download the corresponding EST-SSR information from within ESAP Plus. ESAP Plus can handle considerably large EST datasets. This EST-SSR discovery tool can be accessed directly from: http://gbp.kku.ac.th/esap_plus/ .
CellMiner Companion: an interactive web application to explore CellMiner NCI-60 data.
Wang, Sufang; Gribskov, Michael; Hazbun, Tony R; Pascuzzi, Pete E
2016-08-01
The NCI-60 human tumor cell line panel is an invaluable resource for cancer researchers, providing drug sensitivity, molecular and phenotypic data for a range of cancer types. CellMiner is a web resource that provides tools for the acquisition and analysis of quality-controlled NCI-60 data. CellMiner supports queries of up to 150 drugs or genes, but the output is an Excel file for each drug or gene. This output format makes it difficult for researchers to explore the data from large queries. CellMiner Companion is a web application that facilitates the exploration and visualization of output from CellMiner, further increasing the accessibility of NCI-60 data. The web application is freely accessible at https://pul-bioinformatics.shinyapps.io/CellMinerCompanion The R source code can be downloaded at https://github.com/pepascuzzi/CellMinerCompanion.git ppascuzz@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA.
Troshin, Peter V; Procter, James B; Barton, Geoffrey J
2011-07-15
JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
Chain, Patrick
2018-05-31
Genomics â the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work â is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.
The Web as an educational tool for/in learning/teaching bioinformatics statistics.
Oliver, J; Pisano, M E; Alonso, T; Roca, P
2005-12-01
Statistics provides essential tool in Bioinformatics to interpret the results of a database search or for the management of enormous amounts of information provided from genomics, proteomics and metabolomics. The goal of this project was the development of a software tool that would be as simple as possible to demonstrate the use of the Bioinformatics statistics. Computer Simulation Methods (CSMs) developed using Microsoft Excel were chosen for their broad range of applications, immediate and easy formula calculation, immediate testing and easy graphics representation, and of general use and acceptance by the scientific community. The result of these endeavours is a set of utilities which can be accessed from the following URL: http://gmein.uib.es/bioinformatica/statistics. When tested on students with previous coursework with traditional statistical teaching methods, the general opinion/overall consensus was that Web-based instruction had numerous advantages, but traditional methods with manual calculations were also needed for their theory and practice. Once having mastered the basic statistical formulas, Excel spreadsheets and graphics were shown to be very useful for trying many parameters in a rapid fashion without having to perform tedious calculations. CSMs will be of great importance for the formation of the students and professionals in the field of bioinformatics, and for upcoming applications of self-learning and continuous formation.
Omics Metadata Management Software v. 1 (OMMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
SSEP: secondary structural elements of proteins
Shanthi, V.; Selvarani, P.; Kiran Kumar, Ch.; Mohire, C. S.; Sekar, K.
2003-01-01
SSEP is a comprehensive resource for accessing information related to the secondary structural elements present in the 25 and 90% non-redundant protein chains. The database contains 1771 protein chains from 1670 protein structures and 6182 protein chains from 5425 protein structures in 25 and 90% non-redundant protein chains, respectively. The current version provides information about the α-helical segments and β-strand fragments of varying lengths. In addition, it also contains the information about 310-helix, β- and ν-turns and hairpin loops. The free graphics program RASMOL has been interfaced with the search engine to visualize the three-dimensional structures of the user queried secondary structural fragment. The database is updated regularly and is available through Bioinformatics web server at http://cluster.physics.iisc.ernet.in/ssep/ or http://144.16.71.148/ssep/. PMID:12824336
ERIC Educational Resources Information Center
Gelbart, Hadas; Brill, Gilat; Yarden, Anat
2009-01-01
Providing learners with opportunities to engage in activities similar to those carried out by scientists was addressed in a web-based research simulation in genetics developed for high school biology students. The research simulation enables learners to apply their genetics knowledge while giving them an opportunity to participate in an authentic…
Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi
2017-03-01
Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.
Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.
2017-01-01
Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609
Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.; ...
2016-11-24
Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
BIRCH: a user-oriented, locally-customizable, bioinformatics system.
Fristensky, Brian
2007-02-09
Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.
BIRCH: A user-oriented, locally-customizable, bioinformatics system
Fristensky, Brian
2007-01-01
Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere. PMID:17291351
D'Antonio, Matteo; Masseroli, Marco
2009-01-01
Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075
PipelineDog: a simple and flexible graphic pipeline construction and maintenance tool.
Zhou, Anbo; Zhang, Yeting; Sun, Yazhou; Xing, Jinchuan
2018-05-01
Analysis pipelines are an essential part of bioinformatics research, and ad hoc pipelines are frequently created by researchers for prototyping and proof-of-concept purposes. However, most existing pipeline management system or workflow engines are too complex for rapid prototyping or learning the pipeline concept. A lightweight, user-friendly and flexible solution is thus desirable. In this study, we developed a new pipeline construction and maintenance tool, PipelineDog. This is a web-based integrated development environment with a modern web graphical user interface. It offers cross-platform compatibility, project management capabilities, code formatting and error checking functions and an online repository. It uses an easy-to-read/write script system that encourages code reuse. With the online repository, it also encourages sharing of pipelines, which enhances analysis reproducibility and accountability. For most users, PipelineDog requires no software installation. Overall, this web application provides a way to rapidly create and easily manage pipelines. PipelineDog web app is freely available at http://web.pipeline.dog. The command line version is available at http://www.npmjs.com/package/pipelinedog and online repository at http://repo.pipeline.dog. ysun@kean.edu or xing@biology.rutgers.edu or ysun@diagnoa.com. Supplementary data are available at Bioinformatics online.
PLIP: fully automated protein-ligand interaction profiler.
Salentin, Sebastian; Schreiber, Sven; Haupt, V Joachim; Adasme, Melissa F; Schroeder, Michael
2015-07-01
The characterization of interactions in protein-ligand complexes is essential for research in structural bioinformatics, drug discovery and biology. However, comprehensive tools are not freely available to the research community. Here, we present the protein-ligand interaction profiler (PLIP), a novel web service for fully automated detection and visualization of relevant non-covalent protein-ligand contacts in 3D structures, freely available at projects.biotec.tu-dresden.de/plip-web. The input is either a Protein Data Bank structure, a protein or ligand name, or a custom protein-ligand complex (e.g. from docking). In contrast to other tools, the rule-based PLIP algorithm does not require any structure preparation. It returns a list of detected interactions on single atom level, covering seven interaction types (hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges and halogen bonds). PLIP stands out by offering publication-ready images, PyMOL session files to generate custom images and parsable result files to facilitate successive data processing. The full python source code is available for download on the website. PLIP's command-line mode allows for high-throughput interaction profiling. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
MOLEonline: a web-based tool for analyzing channels, tunnels and pores (2018 update).
Pravda, Lukáš; Sehnal, David; Toušek, Dominik; Navrátilová, Veronika; Bazgier, Václav; Berka, Karel; Svobodová Vareková, Radka; Koca, Jaroslav; Otyepka, Michal
2018-04-30
MOLEonline is an interactive, web-based application for the detection and characterization of channels (pores and tunnels) within biomacromolecular structures. The updated version of MOLEonline overcomes limitations of the previous version by incorporating the recently developed LiteMol Viewer visualization engine and providing a simple, fully interactive user experience. The application enables two modes of calculation: one is dedicated to the analysis of channels while the other was specifically designed for transmembrane pores. As the application can use both PDB and mmCIF formats, it can be leveraged to analyze a wide spectrum of biomacromolecular structures, e.g. stemming from NMR, X-ray and cryo-EM techniques. The tool is interconnected with other bioinformatics tools (e.g., PDBe, CSA, ChannelsDB, OPM, UniProt) to help both setup and the analysis of acquired results. MOLEonline provides unprecedented analytics for the detection and structural characterization of channels, as well as information about their numerous physicochemical features. Here we present the application of MOLEonline for structural analyses of α-hemolysin and transient receptor potential mucolipin 1 (TRMP1) pores. The MOLEonline application is freely available via the Internet at https://mole.upol.cz.
Dscam1 web server: online prediction of Dscam1 self- and hetero-affinity.
Marini, Simone; Nazzicari, Nelson; Biscarini, Filippo; Wang, Guang-Zhong
2017-06-15
Formation of homodimers by identical Dscam1 protein isomers on cell surface is the key factor for the self-avoidance of growing neurites. Dscam1 immense diversity has a critical role in the formation of arthropod neuronal circuit, showing unique evolutionary properties when compared to other cell surface proteins. Experimental measures are available for 89 self-binding and 1722 hetero-binding protein samples, out of more than 19 thousands (self-binding) and 350 millions (hetero-binding) possible isomer combinations. We developed Dscam1 Web Server to quickly predict Dscam1 self- and hetero- binding affinity for batches of Dscam1 isomers. The server can help the study of Dscam1 affinity and help researchers navigate through the tens of millions of possible isomer combinations to isolate the strong-binding ones. Dscam1 Web Server is freely available at: http://bioinformatics.tecnoparco.org/Dscam1-webserver . Web server code is available at https://gitlab.com/ne1s0n/Dscam1-binding . simone.marini@unipv.it or guangzhong.wang@picb.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
COMAN: a web server for comprehensive metatranscriptomics analysis.
Ni, Yueqiong; Li, Jun; Panagiotou, Gianni
2016-08-11
Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming. We developed COMAN (Comprehensive Metatranscriptomics Analysis), a web-based tool dedicated to automatically and comprehensively analysing metatranscriptomic data. COMAN pipeline includes quality control of raw reads, removal of reads derived from non-coding RNA, followed by functional annotation, comparative statistical analysis, pathway enrichment analysis, co-expression network analysis and high-quality visualisation. The essential data generated by COMAN are also provided in tabular format for additional analysis and integration with other software. The web server has an easy-to-use interface and detailed instructions, and is freely available at http://sbb.hku.hk/COMAN/ CONCLUSIONS: COMAN is an integrated web server dedicated to comprehensive functional analysis of metatranscriptomic data, translating massive amount of reads to data tables and high-standard figures. It is expected to facilitate the researchers with less expertise in bioinformatics in answering microbiota-related biological questions and to increase the accessibility and interpretation of microbiota RNA-Seq data.
Bonnal, Raoul J P; Aerts, Jan; Githinji, George; Goto, Naohisa; MacLean, Dan; Miller, Chase A; Mishima, Hiroyuki; Pagani, Massimiliano; Ramirez-Gonzalez, Ricardo; Smant, Geert; Strozzi, Francesco; Syme, Rob; Vos, Rutger; Wennblom, Trevor J; Woodcroft, Ben J; Katayama, Toshiaki; Prins, Pjotr
2012-04-01
Biogem provides a software development environment for the Ruby programming language, which encourages community-based software development for bioinformatics while lowering the barrier to entry and encouraging best practices. Biogem, with its targeted modular and decentralized approach, software generator, tools and tight web integration, is an improved general model for scaling up collaborative open source software development in bioinformatics. Biogem and modules are free and are OSS. Biogem runs on all systems that support recent versions of Ruby, including Linux, Mac OS X and Windows. Further information at http://www.biogems.info. A tutorial is available at http://www.biogems.info/howto.html bonnal@ingm.org.
Secure Encapsulation and Publication of Biological Services in the Cloud Computing Environment
Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon
2013-01-01
Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved. PMID:24078906
Secure encapsulation and publication of biological services in the cloud computing environment.
Zhang, Weizhe; Wang, Xuehui; Lu, Bo; Kim, Tai-hoon
2013-01-01
Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved.
BioServices: a common Python package to access biological Web Services programmatically.
Cokelaer, Thomas; Pultz, Dennis; Harder, Lea M; Serra-Musach, Jordi; Saez-Rodriguez, Julio
2013-12-15
Web interfaces provide access to numerous biological databases. Many can be accessed to in a programmatic way thanks to Web Services. Building applications that combine several of them would benefit from a single framework. BioServices is a comprehensive Python framework that provides programmatic access to major bioinformatics Web Services (e.g. KEGG, UniProt, BioModels, ChEMBLdb). Wrapping additional Web Services based either on Representational State Transfer or Simple Object Access Protocol/Web Services Description Language technologies is eased by the usage of object-oriented programming. BioServices releases and documentation are available at http://pypi.python.org/pypi/bioservices under a GPL-v3 license.
Almeida, Jonas S.; Iriabho, Egiebade E.; Gorrepati, Vijaya L.; Wilkinson, Sean R.; Grüneberg, Alexander; Robbins, David E.; Hackney, James R.
2012-01-01
Background: Image bioinformatics infrastructure typically relies on a combination of server-side high-performance computing and client desktop applications tailored for graphic rendering. On the server side, matrix manipulation environments are often used as the back-end where deployment of specialized analytical workflows takes place. However, neither the server-side nor the client-side desktop solution, by themselves or combined, is conducive to the emergence of open, collaborative, computational ecosystems for image analysis that are both self-sustained and user driven. Materials and Methods: ImageJS was developed as a browser-based webApp, untethered from a server-side backend, by making use of recent advances in the modern web browser such as a very efficient compiler, high-end graphical rendering capabilities, and I/O tailored for code migration. Results: Multiple versioned code hosting services were used to develop distinct ImageJS modules to illustrate its amenability to collaborative deployment without compromise of reproducibility or provenance. The illustrative examples include modules for image segmentation, feature extraction, and filtering. The deployment of image analysis by code migration is in sharp contrast with the more conventional, heavier, and less safe reliance on data transfer. Accordingly, code and data are loaded into the browser by exactly the same script tag loading mechanism, which offers a number of interesting applications that would be hard to attain with more conventional platforms, such as NIH's popular ImageJ application. Conclusions: The modern web browser was found to be advantageous for image bioinformatics in both the research and clinical environments. This conclusion reflects advantages in deployment scalability and analysis reproducibility, as well as the critical ability to deliver advanced computational statistical procedures machines where access to sensitive data is controlled, that is, without local “download and installation”. PMID:22934238
Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl ES
2009-01-01
Background Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics. PMID:19732427
Almeida, Jonas S; Iriabho, Egiebade E; Gorrepati, Vijaya L; Wilkinson, Sean R; Grüneberg, Alexander; Robbins, David E; Hackney, James R
2012-01-01
Image bioinformatics infrastructure typically relies on a combination of server-side high-performance computing and client desktop applications tailored for graphic rendering. On the server side, matrix manipulation environments are often used as the back-end where deployment of specialized analytical workflows takes place. However, neither the server-side nor the client-side desktop solution, by themselves or combined, is conducive to the emergence of open, collaborative, computational ecosystems for image analysis that are both self-sustained and user driven. ImageJS was developed as a browser-based webApp, untethered from a server-side backend, by making use of recent advances in the modern web browser such as a very efficient compiler, high-end graphical rendering capabilities, and I/O tailored for code migration. Multiple versioned code hosting services were used to develop distinct ImageJS modules to illustrate its amenability to collaborative deployment without compromise of reproducibility or provenance. The illustrative examples include modules for image segmentation, feature extraction, and filtering. The deployment of image analysis by code migration is in sharp contrast with the more conventional, heavier, and less safe reliance on data transfer. Accordingly, code and data are loaded into the browser by exactly the same script tag loading mechanism, which offers a number of interesting applications that would be hard to attain with more conventional platforms, such as NIH's popular ImageJ application. The modern web browser was found to be advantageous for image bioinformatics in both the research and clinical environments. This conclusion reflects advantages in deployment scalability and analysis reproducibility, as well as the critical ability to deliver advanced computational statistical procedures machines where access to sensitive data is controlled, that is, without local "download and installation".
Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl E S
2009-09-04
Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.
Knowledge-driven enhancements for task composition in bioinformatics.
Sutherland, Karen; McLeod, Kenneth; Ferguson, Gus; Burger, Albert
2009-10-01
A key application area of semantic technologies is the fast-developing field of bioinformatics. Sealife was a project within this field with the aim of creating semantics-based web browsing capabilities for the Life Sciences. This includes meaningfully linking significant terms from the text of a web page to executable web services. It also involves the semantic mark-up of biological terms, linking them to biomedical ontologies, then discovering and executing services based on terms that interest the user. A system was produced which allows a user to identify terms of interest on a web page and subsequently connects these to a choice of web services which can make use of these inputs. Elements of Artificial Intelligence Planning build on this to present a choice of higher level goals, which can then be broken down to construct a workflow. An Argumentation System was implemented to evaluate the results produced by three different gene expression databases. An evaluation of these modules was carried out on users from a variety of backgrounds. Users with little knowledge of web services were able to achieve tasks that used several services in much less time than they would have taken to do this manually. The Argumentation System was also considered a useful resource and feedback was collected on the best way to present results. Overall the system represents a move forward in helping users to both construct workflows and analyse results by incorporating specific domain knowledge into the software. It also provides a mechanism by which web pages can be linked to web services. However, this work covers a specific domain and much co-ordinated effort is needed to make all web services available for use in such a way, i.e. the integration of underlying knowledge is a difficult but essential task.
Nagamani, S; Gaur, A S; Tanneeru, K; Muneeswaran, G; Madugula, S S; Consortium, Mpds; Druzhilovskiy, D; Poroikov, V V; Sastry, G N
2017-11-01
Molecular property diagnostic suite (MPDS) is a Galaxy-based open source drug discovery and development platform. MPDS web portals are designed for several diseases, such as tuberculosis, diabetes mellitus, and other metabolic disorders, specifically aimed to evaluate and estimate the drug-likeness of a given molecule. MPDS consists of three modules, namely data libraries, data processing, and data analysis tools which are configured and interconnected to assist drug discovery for specific diseases. The data library module encompasses vast information on chemical space, wherein the MPDS compound library comprises 110.31 million unique molecules generated from public domain databases. Every molecule is assigned with a unique ID and card, which provides complete information for the molecule. Some of the modules in the MPDS are specific to the diseases, while others are non-specific. Importantly, a suitably altered protocol can be effectively generated for another disease-specific MPDS web portal by modifying some of the modules. Thus, the MPDS suite of web portals shows great promise to emerge as disease-specific portals of great value, integrating chemoinformatics, bioinformatics, molecular modelling, and structure- and analogue-based drug discovery approaches.
The web server of IBM's Bioinformatics and Pattern Discovery group.
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-07-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
The web server of IBM's Bioinformatics and Pattern Discovery group
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-01-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12824385
Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics
ERIC Educational Resources Information Center
Likic, Vladimir A.
2006-01-01
This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…
Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon
2013-11-15
In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.
A systematic approach to infer biological relevance and biases of gene network structures.
Antonov, Alexey V; Tetko, Igor V; Mewes, Hans W
2006-01-10
The development of high-throughput technologies has generated the need for bioinformatics approaches to assess the biological relevance of gene networks. Although several tools have been proposed for analysing the enrichment of functional categories in a set of genes, none of them is suitable for evaluating the biological relevance of the gene network. We propose a procedure and develop a web-based resource (BIOREL) to estimate the functional bias (biological relevance) of any given genetic network by integrating different sources of biological information. The weights of the edges in the network may be either binary or continuous. These essential features make our web tool unique among many similar services. BIOREL provides standardized estimations of the network biases extracted from independent data. By the analyses of real data we demonstrate that the potential application of BIOREL ranges from various benchmarking purposes to systematic analysis of the network biology.
Bradley, Anthony R; Rose, Alexander S; Pavelka, Antonín; Valasatava, Yana; Duarte, Jose M; Prlić, Andreas; Rose, Peter W
2017-06-01
Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.
Pavelka, Antonín; Valasatava, Yana; Prlić, Andreas
2017-01-01
Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis. PMID:28574982
Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J; Davenport, Karen W; Bishop-Lilly, Kimberly A; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P; Chain, Patrick S G
2017-01-09
Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ergatis: a web interface and scalable software system for bioinformatics workflows
Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.
2010-01-01
Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
Figeys, Daniel; Fai, Stephen; Bennett, Steffany A. L.
2013-01-01
Motivation: Establishing phospholipid identities in large lipidomic datasets is a labour-intensive process. Where genomics and proteomics capitalize on sequence-based signatures, glycerophospholipids lack easily definable molecular fingerprints. Carbon chain length, degree of unsaturation, linkage, and polar head group identity must be calculated from mass to charge (m/z) ratios under defined mass spectrometry (MS) conditions. Given increasing MS sensitivity, many m/z values are not represented in existing prediction engines. To address this need, Visualization and Phospholipid Identification is a web-based application that returns all theoretically possible phospholipids for any m/z value and MS condition. Visualization algorithms produce multiple chemical structure files for each species. Curated lipids detected by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics are provided as high-resolution structures. Availability: VaLID is available through the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics resources web site at https://www.med.uottawa.ca/lipidomics/resources.html. Contacts: lipawrd@uottawa.ca Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23162086
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Grasso, Ernesto J.; Sottile, Adolfo E.; Coronel, Carlos E.
2016-01-01
It is known that caltrin (calcium transport inhibitor) protein binds to sperm cells during ejaculation and inhibits extracellular Ca2+ uptake. Although the sequence and some biological features of mouse caltrin I and bovine caltrin are known, their physicochemical properties and tertiary structure are mainly unknown. We predicted the 3D structures of mouse caltrin I and bovine caltrin by molecular homology modeling and threading. Surface electrostatic potentials and electric fields were calculated using the Poisson–Boltzmann equation. Several different bioinformatics tools and available web servers were used to thoroughly analyze the physicochemical characteristics of both proteins, such as their Kyte and Doolittle hydropathy scores and helical wheel projections. The results presented in this work significantly aid further understanding of the molecular mechanisms of caltrin proteins modulating physiological processes associated with fertilization. PMID:27812283
The BioExtract Server: a web-based bioinformatic workflow platform
Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.
2011-01-01
The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
AMPA: an automated web server for prediction of protein antimicrobial regions.
Torrent, Marc; Di Tommaso, Paolo; Pulido, David; Nogués, M Victòria; Notredame, Cedric; Boix, Ester; Andreu, David
2012-01-01
AMPA is a web application for assessing the antimicrobial domains of proteins, with a focus on the design on new antimicrobial drugs. The application provides fast discovery of antimicrobial patterns in proteins that can be used to develop new peptide-based drugs against pathogens. Results are shown in a user-friendly graphical interface and can be downloaded as raw data for later examination. AMPA is freely available on the web at http://tcoffee.crg.cat/apps/ampa. The source code is also available in the web. marc.torrent@upf.edu; david.andreu@upf.edu Supplementary data are available at Bioinformatics online.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-01-01
Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
IonGAP: integrative bacterial genome analysis for Ion Torrent sequence data.
Baez-Ortega, Adrian; Lorenzo-Diaz, Fabian; Hernandez, Mariano; Gonzalez-Vila, Carlos Ignacio; Roda-Garcia, Jose Luis; Colebrook, Marcos; Flores, Carlos
2015-09-01
We introduce IonGAP, a publicly available Web platform designed for the analysis of whole bacterial genomes using Ion Torrent sequence data. Besides assembly, it integrates a variety of comparative genomics, annotation and bacterial classification routines, based on the widely used FASTQ, BAM and SRA file formats. Benchmarking with different datasets evidenced that IonGAP is a fast, powerful and simple-to-use bioinformatics tool. By releasing this platform, we aim to translate low-cost bacterial genome analysis for microbiological prevention and control in healthcare, agroalimentary and pharmaceutical industry applications. IonGAP is hosted by the ITER's Teide-HPC supercomputer and is freely available on the Web for non-commercial use at http://iongap.hpc.iter.es. mcolesan@ull.edu.es or cflores@ull.edu.es Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-06-01
Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
BioRuby: bioinformatics software for the Ruby programming language.
Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki
2010-10-15
The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org
The Phyre2 web portal for protein modelling, prediction and analysis
Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael JE
2017-01-01
Summary Phyre2 is a suite of tools available on the web to predict and analyse protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a protocol. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyse the effect of amino-acid variants (e.g. nsSNPs) for a user’s protein sequence. Users are guided through results by a simple interface at a level of detail determined by them. This protocol will guide a user from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30mins and 2 hours after submission. PMID:25950237
webMGR: an online tool for the multiple genome rearrangement problem.
Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume
2010-02-01
The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.
Distribution and Features of the Six Classes of Peroxiredoxins
Poole, Leslie B.; Nelson, Kimberly J.
2016-01-01
Peroxiredoxins are cysteine-dependent peroxide reductases that group into 6 different, structurally discernable classes. In 2011, our research team reported the application of a bioinformatic approach called active site profiling to extract active site-proximal sequence segments from the 29 distinct, structurally-characterized peroxiredoxins available at the time. These extracted sequences were then used to create unique profiles for the six groups which were subsequently used to search GenBank(nr), allowing identification of ∼3500 peroxiredoxin sequences and their respective subgroups. Summarized in this minireview are the features and phylogenetic distributions of each of these peroxiredoxin subgroups; an example is also provided illustrating the use of the web accessible, searchable database known as PREX to identify subfamily-specific peroxiredoxin sequences for the organism Vitis vinifera (grape). PMID:26810075
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quinn, Gregory B.; Bi, Chunxiao; Christie, Cole H.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Lastly, using the app, users from the general public to expert researchers can quickly search and visualize biomolecules,more » and add personal annotations via the RCSB PDB's integrated MyPDB service.« less
Quinn, Gregory B.; Bi, Chunxiao; Christie, Cole H.; ...
2014-09-02
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Lastly, using the app, users from the general public to expert researchers can quickly search and visualize biomolecules,more » and add personal annotations via the RCSB PDB's integrated MyPDB service.« less
A Microarray Tool Provides Pathway and GO Term Analysis.
Koch, Martin; Royer, Hans-Dieter; Wiese, Michael
2011-12-01
Analysis of gene expression profiles is no longer exclusively a task for bioinformatic experts. However, gaining statistically significant results is challenging and requires both biological knowledge and computational know-how. Here we present a novel, user-friendly microarray reporting tool called maRt. The software provides access to bioinformatic resources, like gene ontology terms and biological pathways by use of the DAVID and the BioMart web-service. Results are summarized in structured HTML reports, each presenting a different layer of information. In these report, contents of diverse sources are integrated and interlinked. To speed up processing, maRt takes advantage of the multi-core technology of modern desktop computers by using parallel processing. Since the software is built upon a RCP infrastructure it might be an outset for developers aiming to integrate novel R based applications. Installer, documentation and various kinds of tutorials are available under LGPL license at the website of our institute http://www.pharma.uni-bonn.de/www/mart. This software is free for academic use. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lipidomics informatics for life-science.
Schwudke, D; Shevchenko, A; Hoffmann, N; Ahrends, R
2017-11-10
Lipidomics encompasses analytical approaches that aim to identify and quantify the complete set of lipids, defined as lipidome in a given cell, tissue or organism as well as their interactions with other molecules. The majority of lipidomics workflows is based on mass spectrometry and has been proven as a powerful tool in system biology in concert with other Omics disciplines. Unfortunately, bioinformatics infrastructures for this relatively young discipline are limited only to some specialists. Search engines, quantification algorithms, visualization tools and databases developed by the 'Lipidomics Informatics for Life-Science' (LIFS) partners will be restructured and standardized to provide broad access to these specialized bioinformatics pipelines. There are many medical challenges related to lipid metabolic alterations that will be fostered by capacity building suggested by LIFS. LIFS as member of the 'German Network for Bioinformatics' (de.NBI) node for 'Bioinformatics for Proteomics' (BioInfra.Prot) and will provide access to the described software as well as to tutorials and consulting services via a unified web-portal. Copyright © 2017 Elsevier B.V. All rights reserved.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-11-28
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-03-01
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
2011-01-01
Background The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. Description SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. Conclusions SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies. PMID:22024447
Wilkinson, Mark D; Vandervalk, Benjamin; McCarthy, Luke
2011-10-24
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.
Integration of cardiac proteome biology and medicine by a specialized knowledgebase.
Zong, Nobel C; Li, Haomin; Li, Hua; Lam, Maggie P Y; Jimenez, Rafael C; Kim, Christina S; Deng, Ning; Kim, Allen K; Choi, Jeong Ho; Zelaya, Ivette; Liem, David; Meyer, David; Odeberg, Jacob; Fang, Caiyun; Lu, Hao-Jie; Xu, Tao; Weiss, James; Duan, Huilong; Uhlen, Mathias; Yates, John R; Apweiler, Rolf; Ge, Junbo; Hermjakob, Henning; Ping, Peipei
2013-10-12
Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts. The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine. We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest. COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.
Sayers, Samantha; Ulysse, Guerlain; Xiang, Zuoshuang; He, Yongqun
2012-01-01
Vaccine adjuvants are compounds that enhance host immune responses to co-administered antigens in vaccines. Vaxjo is a web-based central database and analysis system that curates, stores, and analyzes vaccine adjuvants and their usages in vaccine development. Basic information of a vaccine adjuvant stored in Vaxjo includes adjuvant name, components, structure, appearance, storage, preparation, function, safety, and vaccines that use this adjuvant. Reliable references are curated and cited. Bioinformatics scripts are developed and used to link vaccine adjuvants to different adjuvanted vaccines stored in the general VIOLIN vaccine database. Presently, 103 vaccine adjuvants have been curated in Vaxjo. Among these adjuvants, 98 have been used in 384 vaccines stored in VIOLIN against over 81 pathogens, cancers, or allergies. All these vaccine adjuvants are categorized and analyzed based on adjuvant types, pathogens used, and vaccine types. As a use case study of vaccine adjuvants in infectious disease vaccines, the adjuvants used in Brucella vaccines are specifically analyzed. A user-friendly web query and visualization interface is developed for interactive vaccine adjuvant search. To support data exchange, the information of vaccine adjuvants is stored in the Vaccine Ontology (VO) in the Web Ontology Language (OWL) format.
InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; ...
2016-08-31
Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less
InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin
2016-08-31
The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .
Sayers, Samantha; Ulysse, Guerlain; Xiang, Zuoshuang; He, Yongqun
2012-01-01
Vaccine adjuvants are compounds that enhance host immune responses to co-administered antigens in vaccines. Vaxjo is a web-based central database and analysis system that curates, stores, and analyzes vaccine adjuvants and their usages in vaccine development. Basic information of a vaccine adjuvant stored in Vaxjo includes adjuvant name, components, structure, appearance, storage, preparation, function, safety, and vaccines that use this adjuvant. Reliable references are curated and cited. Bioinformatics scripts are developed and used to link vaccine adjuvants to different adjuvanted vaccines stored in the general VIOLIN vaccine database. Presently, 103 vaccine adjuvants have been curated in Vaxjo. Among these adjuvants, 98 have been used in 384 vaccines stored in VIOLIN against over 81 pathogens, cancers, or allergies. All these vaccine adjuvants are categorized and analyzed based on adjuvant types, pathogens used, and vaccine types. As a use case study of vaccine adjuvants in infectious disease vaccines, the adjuvants used in Brucella vaccines are specifically analyzed. A user-friendly web query and visualization interface is developed for interactive vaccine adjuvant search. To support data exchange, the information of vaccine adjuvants is stored in the Vaccine Ontology (VO) in the Web Ontology Language (OWL) format. PMID:22505817
InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang
Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less
caCORE: a common infrastructure for cancer informatics.
Covitz, Peter A; Hartel, Frank; Schaefer, Carl; De Coronado, Sherri; Fragoso, Gilberto; Sahni, Himanso; Gustafson, Scott; Buetow, Kenneth H
2003-12-12
Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).
Studying the co-evolution of protein families with the Mirrortree web server.
Ochoa, David; Pazos, Florencio
2010-05-15
The Mirrortree server allows to graphically and interactively study the co-evolution of two protein families, and investigate their possible interactions and functional relationships in a taxonomic context. The server includes the possibility of starting from single sequences and hence it can be used by non-expert users. The web server is freely available at http://csbg.cnb.csic.es/mtserver. It was tested in the main web browsers. Adobe Flash Player is required at the client side to perform the interactive assessment of co-evolution. pazos@cnb.csic.es Supplementary data are available at Bioinformatics online.
Queralt-Rosinach, Núria; Piñero, Janet; Bravo, Àlex; Sanz, Ferran; Furlong, Laura I
2016-07-15
DisGeNET-RDF makes available knowledge on the genetic basis of human diseases in the Semantic Web. Gene-disease associations (GDAs) and their provenance metadata are published as human-readable and machine-processable web resources. The information on GDAs included in DisGeNET-RDF is interlinked to other biomedical databases to support the development of bioinformatics approaches for translational research through evidence-based exploitation of a rich and fully interconnected linked open data. http://rdf.disgenet.org/ support@disgenet.org. © The Author 2016. Published by Oxford University Press.
Web-based network analysis and visualization using CellMaps
Salavert, Francisco; García-Alonso, Luz; Sánchez, Rubén; Alonso, Roberto; Bleda, Marta; Medina, Ignacio; Dopazo, Joaquín
2016-01-01
Summary: CellMaps is an HTML5 open-source web tool that allows displaying, editing, exploring and analyzing biological networks as well as integrating metadata into them. Computations and analyses are remotely executed in high-end servers, and all the functionalities are available through RESTful web services. CellMaps can easily be integrated in any web page by using an available JavaScript API. Availability and Implementation: The application is available at: http://cellmaps.babelomics.org/ and the code can be found in: https://github.com/opencb/cell-maps. The client is implemented in JavaScript and the server in C and Java. Contact: jdopazo@cipf.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27296979
Web-based network analysis and visualization using CellMaps.
Salavert, Francisco; García-Alonso, Luz; Sánchez, Rubén; Alonso, Roberto; Bleda, Marta; Medina, Ignacio; Dopazo, Joaquín
2016-10-01
: CellMaps is an HTML5 open-source web tool that allows displaying, editing, exploring and analyzing biological networks as well as integrating metadata into them. Computations and analyses are remotely executed in high-end servers, and all the functionalities are available through RESTful web services. CellMaps can easily be integrated in any web page by using an available JavaScript API. The application is available at: http://cellmaps.babelomics.org/ and the code can be found in: https://github.com/opencb/cell-maps The client is implemented in JavaScript and the server in C and Java. jdopazo@cipf.es Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi
2018-01-01
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi
2018-02-27
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E
2011-01-01
After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.
Jimenez, Rafael C; Corpas, Manuel
2013-01-01
Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
Wong, Kim; Navarro, José Fernández; Bergenstråhle, Ludvig; Ståhl, Patrik L; Lundeberg, Joakim
2018-06-01
Spatial Transcriptomics (ST) is a method which combines high resolution tissue imaging with high troughput transcriptome sequencing data. This data must be aligned with the images for correct visualization, a process that involves several manual steps. Here we present ST Spot Detector, a web tool that automates and facilitates this alignment through a user friendly interface. jose.fernandez.navarro@scilifelab.se. Supplementary data are available at Bioinformatics online.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
2009-07-01
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
The GMOD Drupal bioinformatic server framework.
Papanicolaou, Alexie; Heckel, David G
2010-12-15
Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com.
The Topology Prediction of Membrane Proteins: A Web-Based Tutorial.
Kandemir-Cavas, Cagin; Cavas, Levent; Alyuruk, Hakan
2018-06-01
There is a great need for development of educational materials on the transfer of current bioinformatics knowledge to undergraduate students in bioscience departments. In this study, it is aimed to prepare an example in silico laboratory tutorial on the topology prediction of membrane proteins by bioinformatics tools. This laboratory tutorial is prepared for biochemistry lessons at bioscience departments (biology, chemistry, biochemistry, molecular biology and genetics, and faculty of medicine). The tutorial is intended for students who have not taken a bioinformatics course yet or already have taken a course as an introduction to bioinformatics. The tutorial is based on step-by-step explanations with illustrations. It can be applied under supervision of an instructor in the lessons, or it can be used as a self-study guide by students. In the tutorial, membrane-spanning regions and α-helices of membrane proteins were predicted by internet-based bioinformatics tools. According to the results achieved from internet-based bioinformatics tools, the algorithms and parameters used were effective on the accuracy of prediction. The importance of this laboratory tutorial lies on the facts that it provides an introduction to the bioinformatics and that it also demonstrates an in silico laboratory application to the students at natural sciences. The presented example education material is applicable easily at all departments that have internet connection. This study presents an alternative education material to the students in biochemistry laboratories in addition to classical laboratory experiments.
The EMBRACE web service collection
Pettifer, Steve; Ison, Jon; Kalaš, Matúš; Thorne, Dave; McDermott, Philip; Jonassen, Inge; Liaquat, Ali; Fernández, José M.; Rodriguez, Jose M.; Partners, INB-; Pisano, David G.; Blanchet, Christophe; Uludag, Mahmut; Rice, Peter; Bartaseviciute, Edita; Rapacki, Kristoffer; Hekkelman, Maarten; Sand, Olivier; Stockinger, Heinz; Clegg, Andrew B.; Bongcam-Rudloff, Erik; Salzemann, Jean; Breton, Vincent; Attwood, Teresa K.; Cameron, Graham; Vriend, Gert
2010-01-01
The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions. PMID:20462862
Karaca, Ezgi; Melquiond, Adrien S J; de Vries, Sjoerd J; Kastritis, Panagiotis L; Bonvin, Alexandre M J J
2010-08-01
Over the last years, large scale proteomics studies have generated a wealth of information of biomolecular complexes. Adding the structural dimension to the resulting interactomes represents a major challenge that classical structural experimental methods alone will have difficulties to confront. To meet this challenge, complementary modeling techniques such as docking are thus needed. Among the current docking methods, HADDOCK (High Ambiguity-Driven DOCKing) distinguishes itself from others by the use of experimental and/or bioinformatics data to drive the modeling process and has shown a strong performance in the critical assessment of prediction of interactions (CAPRI), a blind experiment for the prediction of interactions. Although most docking programs are limited to binary complexes, HADDOCK can deal with multiple molecules (up to six), a capability that will be required to build large macromolecular assemblies. We present here a novel web interface of HADDOCK that allows the user to dock up to six biomolecules simultaneously. This interface allows the inclusion of a large variety of both experimental and/or bioinformatics data and supports several types of cyclic and dihedral symmetries in the docking of multibody assemblies. The server was tested on a benchmark of six cases, containing five symmetric homo-oligomeric protein complexes and one symmetric protein-DNA complex. Our results reveal that, in the presence of either bioinformatics and/or experimental data, HADDOCK shows an excellent performance: in all cases, HADDOCK was able to generate good to high quality solutions and ranked them at the top, demonstrating its ability to model symmetric multicomponent assemblies. Docking methods can thus play an important role in adding the structural dimension to interactomes. However, although the current docking methodologies were successful for a vast range of cases, considering the variety and complexity of macromolecular assemblies, inclusion of some kind of experimental information (e.g. from mass spectrometry, nuclear magnetic resonance, cryoelectron microscopy, etc.) will remain highly desirable to obtain reliable results.
NEIBank: Genomics and bioinformatics resources for vision research
Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David
2008-01-01
NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525
Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren
2016-01-15
: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. hodor_paul@bah.com. © The Author 2015. Published by Oxford University Press.
Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren
2016-01-01
Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com PMID:26428290
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.
D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano
2015-01-01
The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
Unraveling the Web of Viroinformatics: Computational Tools and Databases in Virus Research
Priyadarshini, Pragya; Vrati, Sudhanshu
2014-01-01
The beginning of the second century of research in the field of virology (the first virus was discovered in 1898) was marked by its amalgamation with bioinformatics, resulting in the birth of a new domain—viroinformatics. The availability of more than 100 Web servers and databases embracing all or specific viruses (for example, dengue virus, influenza virus, hepatitis virus, human immunodeficiency virus [HIV], hemorrhagic fever virus [HFV], human papillomavirus [HPV], West Nile virus, etc.) as well as distinct applications (comparative/diversity analysis, viral recombination, small interfering RNA [siRNA]/short hairpin RNA [shRNA]/microRNA [miRNA] studies, RNA folding, protein-protein interaction, structural analysis, and phylotyping and genotyping) will definitely aid the development of effective drugs and vaccines. However, information about their access and utility is not available at any single source or on any single platform. Therefore, a compendium of various computational tools and resources dedicated specifically to virology is presented in this article. PMID:25428870
Unraveling the web of viroinformatics: computational tools and databases in virus research.
Sharma, Deepak; Priyadarshini, Pragya; Vrati, Sudhanshu
2015-02-01
The beginning of the second century of research in the field of virology (the first virus was discovered in 1898) was marked by its amalgamation with bioinformatics, resulting in the birth of a new domain--viroinformatics. The availability of more than 100 Web servers and databases embracing all or specific viruses (for example, dengue virus, influenza virus, hepatitis virus, human immunodeficiency virus [HIV], hemorrhagic fever virus [HFV], human papillomavirus [HPV], West Nile virus, etc.) as well as distinct applications (comparative/diversity analysis, viral recombination, small interfering RNA [siRNA]/short hairpin RNA [shRNA]/microRNA [miRNA] studies, RNA folding, protein-protein interaction, structural analysis, and phylotyping and genotyping) will definitely aid the development of effective drugs and vaccines. However, information about their access and utility is not available at any single source or on any single platform. Therefore, a compendium of various computational tools and resources dedicated specifically to virology is presented in this article. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
The JCSG high-throughput structural biology pipeline.
Elsliger, Marc André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wooley, John; Wüthrich, Kurt; Wilson, Ian A
2010-10-01
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.
repRNA: a web server for generating various feature vectors of RNA sequences.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2016-02-01
With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .
Intuitive web-based experimental design for high-throughput biomedical data.
Friedrich, Andreas; Kenar, Erhan; Kohlbacher, Oliver; Nahnsen, Sven
2015-01-01
Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.
Semantic Web meets Integrative Biology: a survey.
Chen, Huajun; Yu, Tong; Chen, Jake Y
2013-01-01
Integrative Biology (IB) uses experimental or computational quantitative technologies to characterize biological systems at the molecular, cellular, tissue and population levels. IB typically involves the integration of the data, knowledge and capabilities across disciplinary boundaries in order to solve complex problems. We identify a series of bioinformatics problems posed by interdisciplinary integration: (i) data integration that interconnects structured data across related biomedical domains; (ii) ontology integration that brings jargons, terminologies and taxonomies from various disciplines into a unified network of ontologies; (iii) knowledge integration that integrates disparate knowledge elements from multiple sources; (iv) service integration that build applications out of services provided by different vendors. We argue that IB can benefit significantly from the integration solutions enabled by Semantic Web (SW) technologies. The SW enables scientists to share content beyond the boundaries of applications and websites, resulting into a web of data that is meaningful and understandable to any computers. In this review, we provide insight into how SW technologies can be used to build open, standardized and interoperable solutions for interdisciplinary integration on a global basis. We present a rich set of case studies in system biology, integrative neuroscience, bio-pharmaceutics and translational medicine, to highlight the technical features and benefits of SW applications in IB.
Illuminate Knowledge Elements in Geoscience Literature
NASA Astrophysics Data System (ADS)
Ma, X.; Zheng, J. G.; Wang, H.; Fox, P. A.
2015-12-01
There are numerous dark data hidden in geoscience literature. Efficient retrieval and reuse of those data will greatly benefit geoscience researches of nowadays. Among the works of data rescue, a topic of interest is illuminating the knowledge framework, i.e. entities and relationships, embedded in documents. Entity recognition and linking have received extensive attention in news and social media analysis, as well as in bioinformatics. In the domain of geoscience, however, such works are limited. We will present our work on how to use knowledge bases on the Web, such as ontologies and vocabularies, to facilitate entity recognition and linking in geoscience literature. The work deploys an un-supervised collective inference approach [1] to link entity mentions in unstructured texts to a knowledge base, which leverages the meaningful information and structures in ontologies and vocabularies for similarity computation and entity ranking. Our work is still in the initial stage towards the detection of knowledge frameworks in literature, and we have been collecting geoscience ontologies and vocabularies in order to build a comprehensive geoscience knowledge base [2]. We hope the work will initiate new ideas and collaborations on dark data rescue, as well as on the synthesis of data and knowledge from geoscience literature. References: 1. Zheng, J., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D.L., Hendler, J., and Ji, H. 2014. Entity linking for biomedical literature. In Proceedings of ACM 8th International Workshop on Data and Text Mining in Bioinformatics, Shanghai, China. 2. Ma, X. Zheng, J., 2015. Linking geoscience entity mentions to the Web of Data. ESIP 2015 Summer Meeting, Pacific Grove, CA.
Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
AncestrySNPminer: A bioinformatics tool to retrieve and develop ancestry informative SNP panels
Amirisetty, Sushil; Khurana Hershey, Gurjit K.; Baye, Tesfaye M.
2012-01-01
A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple “scripting at the click of a button” functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php. PMID:22584067
Ramu, Chenna
2003-07-01
SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.
VisBOL: Web-Based Tools for Synthetic Biology Design Visualization.
McLaughlin, James Alastair; Pocock, Matthew; Mısırlı, Göksel; Madsen, Curtis; Wipat, Anil
2016-08-19
VisBOL is a Web-based application that allows the rendering of genetic circuit designs, enabling synthetic biologists to visually convey designs in SBOL visual format. VisBOL designs can be exported to formats including PNG and SVG images to be embedded in Web pages, presentations and publications. The VisBOL tool enables the automated generation of visualizations from designs specified using the Synthetic Biology Open Language (SBOL) version 2.0, as well as a range of well-known bioinformatics formats including GenBank and Pigeoncad notation. VisBOL is provided both as a user accessible Web site and as an open-source (BSD) JavaScript library that can be used to embed diagrams within other content and software.
The value of the Semantic Web in the laboratory.
Frey, Jeremy G
2009-06-01
The Semantic Web is beginning to impact on the wider chemical and physical sciences, beyond the earlier adopted bio-informatics. While useful in large-scale data driven science with automated processing, these technologies can also help integrate the work of smaller scale laboratories producing diverse data. The semantics aid the discovery, reliable re-use of data, provide improved provenance and facilitate automated processing by increased resilience to changes in presentation and reduced ambiguity. The Semantic Web, its tools and collections are not yet competitive with well-established solutions to current problems. It is in the reduced cost of instituting solutions to new problems that the versatility of Semantic Web-enabled data and resources will make their mark once the more general-purpose tools are more available.
ERIC Educational Resources Information Center
Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany
2007-01-01
We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…
Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.
Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert
2013-03-01
Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.
EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats
Ison, Jon; Kalaš, Matúš; Jonassen, Inge; Bolser, Dan; Uludag, Mahmut; McWilliam, Hamish; Malone, James; Lopez, Rodrigo; Pettifer, Steve; Rice, Peter
2013-01-01
Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. Contact: jison@ebi.ac.uk PMID:23479348
Biological Web Service Repositories Review
Urdidiales‐Nieto, David; Navas‐Delgado, Ismael
2016-01-01
Abstract Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re‐execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re‐check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well‐known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories. PMID:27783459
PredictProtein—an open resource for online prediction of protein structural and functional features
Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard
2014-01-01
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
Lourenço, Anália; Ferreira, Andreia; Veiga, Nuno; Machado, Idalina; Pereira, Maria Olivia; Azevedo, Nuno F
2012-01-01
Consortia of microorganisms, commonly known as biofilms, are attracting much attention from the scientific community due to their impact in human activity. As biofilm research grows to be a data-intensive discipline, the need for suitable bioinformatics approaches becomes compelling to manage and validate individual experiments, and also execute inter-laboratory large-scale comparisons. However, biofilm data is widespread across ad hoc, non-standardized individual files and, thus, data interchange among researchers, or any attempt of cross-laboratory experimentation or analysis, is hardly possible or even attempted. This paper presents BiofOmics, the first publicly accessible Web platform specialized in the management and analysis of data derived from biofilm high-throughput studies. The aim is to promote data interchange across laboratories, implementing collaborative experiments, and enable the development of bioinformatics tools in support of the processing and analysis of the increasing volumes of experimental biofilm data that are being generated. BiofOmics' data deposition facility enforces data structuring and standardization, supported by controlled vocabulary. Researchers are responsible for the description of the experiments, their results and conclusions. BiofOmics' curators interact with submitters only to enforce data structuring and the use of controlled vocabulary. Then, BiofOmics' search facility makes publicly available the profile and data associated with a submitted study so that any researcher can profit from these standardization efforts to compare similar studies, generate new hypotheses to be tested or even extend the conditions experimented in the study. BiofOmics' novelty lies in its support to standardized data deposition, the availability of computerizable data files and the free-of-charge dissemination of biofilm studies across the community. Hopefully, this will open promising research possibilities, namely the comparison of results between different laboratories, the reproducibility of methods within and between laboratories, and the development of guidelines and standardized protocols for biofilm formation operating procedures and analytical methods.
The GMOD Drupal Bioinformatic Server Framework
Papanicolaou, Alexie; Heckel, David G.
2010-01-01
Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988
Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis
Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert
2013-01-01
Motivation: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. Results: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of ‘real-world’ bioinformatics tools. Availability: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics. Contact: robert@techfak.uni-bielefeld.de Supplementary information: Supplementary data are available at Bioinformatics online PMID:23355290
PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J
2017-09-20
There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.
BioSWR – Semantic Web Services Registry for Bioinformatics
Repchevsky, Dmitry; Gelpi, Josep Ll.
2014-01-01
Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license. PMID:25233118
BioSWR--semantic web services registry for bioinformatics.
Repchevsky, Dmitry; Gelpi, Josep Ll
2014-01-01
Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license.
Díaz-Gay, Marcos; Vila-Casadesús, Maria; Franch-Expósito, Sebastià; Hernández-Illán, Eva; Lozano, Juan José; Castellví-Bel, Sergi
2018-06-14
Mutational signatures have been proved as a valuable pattern in somatic genomics, mainly regarding cancer, with a potential application as a biomarker in clinical practice. Up to now, several bioinformatic packages to address this topic have been developed in different languages/platforms. MutationalPatterns has arisen as the most efficient tool for the comparison with the signatures currently reported in the Catalogue of Somatic Mutations in Cancer (COSMIC) database. However, the analysis of mutational signatures is nowadays restricted to a small community of bioinformatic experts. In this work we present Mutational Signatures in Cancer (MuSiCa), a new web tool based on MutationalPatterns and built using the Shiny framework in R language. By means of a simple interface suited to non-specialized researchers, it provides a comprehensive analysis of the somatic mutational status of the supplied cancer samples. It permits characterizing the profile and burden of mutations, as well as quantifying COSMIC-reported mutational signatures. It also allows classifying samples according to the above signature contributions. MuSiCa is a helpful web application to characterize mutational signatures in cancer samples. It is accessible online at http://bioinfo.ciberehd.org/GPtoCRC/en/tools.html and source code is freely available at https://github.com/marcos-diazg/musica .
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Li, Jun; Roebuck, Paul; Grünewald, Stefan; Liang, Han
2012-07-01
An important task in biomedical research is identifying biomarkers that correlate with patient clinical data, and these biomarkers then provide a critical foundation for the diagnosis and treatment of disease. Conventionally, such an analysis is based on individual genes, but the results are often noisy and difficult to interpret. Using a biological network as the searching platform, network-based biomarkers are expected to be more robust and provide deep insights into the molecular mechanisms of disease. We have developed a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data, SurvNet. The web server takes three input files: one biological network file, representing a gene regulatory or protein interaction network; one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray expression data or DNA methylation data); and one patient survival data file (e.g. patients' progression-free survival data). Given user-defined parameters, SurvNet will automatically search for subnetworks that most correlate with the observed patient survival data. As the output, SurvNet will generate a list of network biomarkers and display them through a user-friendly interface. SurvNet can be accessed at http://bioinformatics.mdanderson.org/main/SurvNet.
WeBIAS: a web server for publishing bioinformatics applications.
Daniluk, Paweł; Wilczyński, Bartek; Lesyng, Bogdan
2015-11-02
One of the requirements for a successful scientific tool is its availability. Developing a functional web service, however, is usually considered a mundane and ungratifying task, and quite often neglected. When publishing bioinformatic applications, such attitude puts additional burden on the reviewers who have to cope with poorly designed interfaces in order to assess quality of presented methods, as well as impairs actual usefulness to the scientific community at large. In this note we present WeBIAS-a simple, self-contained solution to make command-line programs accessible through web forms. It comprises a web portal capable of serving several applications and backend schedulers which carry out computations. The server handles user registration and authentication, stores queries and results, and provides a convenient administrator interface. WeBIAS is implemented in Python and available under GNU Affero General Public License. It has been developed and tested on GNU/Linux compatible platforms covering a vast majority of operational WWW servers. Since it is written in pure Python, it should be easy to deploy also on all other platforms supporting Python (e.g. Windows, Mac OS X). Documentation and source code, as well as a demonstration site are available at http://bioinfo.imdik.pan.pl/webias . WeBIAS has been designed specifically with ease of installation and deployment of services in mind. Setting up a simple application requires minimal effort, yet it is possible to create visually appealing, feature-rich interfaces for query submission and presentation of results.
EST-PAC a web package for EST annotation and protein sequence prediction
Strahm, Yvan; Powell, David; Lefèvre, Christophe
2006-01-01
With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782
PRGdb: a bioinformatics platform for plant resistance gene analysis
Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella
2010-01-01
PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694
String Mining in Bioinformatics
NASA Astrophysics Data System (ADS)
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
String Mining in Bioinformatics
NASA Astrophysics Data System (ADS)
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
Gaia: automated quality assessment of protein structure models.
Kota, Pradeep; Ding, Feng; Ramachandran, Srinivas; Dokholyan, Nikolay V
2011-08-15
Increasing use of structural modeling for understanding structure-function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures. In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement. We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures. dokh@unc.edu Supplementary data are available at Bioinformatics online.
A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data
Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may levera...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.
2008-05-04
This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroicmore » effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster.« less
Visualization of protein sequence features using JavaScript and SVG with pViz.js.
Mukhyala, Kiran; Masselot, Alexandre
2014-12-01
pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Unipept web services for metaproteomics analysis.
Mesuere, Bart; Willems, Toon; Van der Jeugt, Felix; Devreese, Bart; Vandamme, Peter; Dawyndt, Peter
2016-06-01
Unipept is an open source web application that is designed for metaproteomics analysis with a focus on interactive datavisualization. It is underpinned by a fast index built from UniProtKB and the NCBI taxonomy that enables quick retrieval of all UniProt entries in which a given tryptic peptide occurs. Unipept version 2.4 introduced web services that provide programmatic access to the metaproteomics analysis features. This enables integration of Unipept functionality in custom applications and data processing pipelines. The web services are freely available at http://api.unipept.ugent.be and are open sourced under the MIT license. Unipept@ugent.be Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Rényi divergence enables accurate and precise cluster analysis for localisation microscopy.
Staszowska, Adela D; Fox-Roberts, Patrick; Hirvonen, Liisa M; Peddie, Christopher J; Collinson, Lucy M; Jones, Gareth E; Cox, Susan
2018-06-01
Clustering analysis is a key technique for quantitatively characterising structures in localisation microscopy images. To build up accurate information about biological structures, it is critical that the quantification is both accurate (close to the ground truth) and precise (has small scatter and is reproducible). Here we describe how the Rényi divergence can be used for cluster radius measurements in localisation microscopy data. We demonstrate that the Rényi divergence can operate with high levels of background and provides results which are more accurate than Ripley's functions, Voronoi tesselation or DBSCAN. Data supporting this research will be made accessible via a web link. Software codes developed for this work can be accessed via http://coxphysics.com/Renyi_divergence_software.zip. Implemented in C ++. Correspondence and requests for materials can be also addressed to the corresponding author. adela.staszowska@gmail.com or susan.cox@kcl.ac.uk. Supplementary data are available at Bioinformatics online.
Quinn, Gregory B; Bi, Chunxiao; Christie, Cole H; Pang, Kyle; Prlić, Andreas; Nakane, Takanori; Zardecki, Christine; Voigt, Maria; Berman, Helen M; Bourne, Philip E; Rose, Peter W
2015-01-01
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB's integrated MyPDB service. RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org). © The Author 2014. Published by Oxford University Press.
Quinn, Gregory B.; Bi, Chunxiao; Christie, Cole H.; Pang, Kyle; Prlić, Andreas; Nakane, Takanori; Zardecki, Christine; Voigt, Maria; Berman, Helen M.; Rose, Peter W.
2015-01-01
Summary: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB’s integrated MyPDB service. Availability and implementation: RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org). Contact: pwrose@ucsd.edu PMID:25183487
p3d--Python module for structural bioinformatics.
Fufezan, Christian; Specht, Michael
2009-08-21
High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.
Sumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, Jiri
2018-05-23
HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calculation. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homology models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools-WHAT_CHECK, PROCHECK and MolProbity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homology models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estimation of thermodynamic stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering experimental testing. HotSpot Wizard is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.
BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.
Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano
2015-07-01
Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.
ERIC Educational Resources Information Center
Maier, Caroline Alexandra
2001-01-01
Presents an activity in which students seek answers to questions about evolutionary relationships by using genetic databases and bioinformatics software. Students build genetic distance matrices and phylogenetic trees based on molecular sequence data using web-based resources. Provides a flowchart of steps involved in accessing, retrieving, and…
SoMART, a web server for miRNA, tasiRNA and target gene analysis in Solanaceae plants
USDA-ARS?s Scientific Manuscript database
Plant micro(mi)RNAs and trans-acting small interfering (tasi)RNAs mediate posttranscriptional silencing of genes and play important roles in a variety of biological processes. Although bioinformatics prediction and small (s)RNA cloning are the key approaches used for identification of miRNAs, tasiRN...
Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin
2015-07-01
As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.
Biological Web Service Repositories Review.
Urdidiales-Nieto, David; Navas-Delgado, Ismael; Aldana-Montes, José F
2017-05-01
Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re-execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re-check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well-known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules.
Garcia-Martin, Juan Antonio; Dotu, Ivan; Clote, Peter
2015-07-01
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bioinformatics: indispensable, yet hidden in plain sight?
Bartlett, Andrew; Penders, Bart; Lewis, Jamie
2017-06-21
Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.
PMAnalyzer: a new web interface for bacterial growth curve analysis.
Cuevas, Daniel A; Edwards, Robert A
2017-06-15
Bacterial growth curves are essential representations for characterizing bacteria metabolism within a variety of media compositions. Using high-throughput, spectrophotometers capable of processing tens of 96-well plates, quantitative phenotypic information can be easily integrated into the current data structures that describe a bacterial organism. The PMAnalyzer pipeline performs a growth curve analysis to parameterize the unique features occurring within microtiter wells containing specific growth media sources. We have expanded the pipeline capabilities and provide a user-friendly, online implementation of this automated pipeline. PMAnalyzer version 2.0 provides fast automatic growth curve parameter analysis, growth identification and high resolution figures of sample-replicate growth curves and several statistical analyses. PMAnalyzer v2.0 can be found at https://edwards.sdsu.edu/pmanalyzer/ . Source code for the pipeline can be found on GitHub at https://github.com/dacuevas/PMAnalyzer . Source code for the online implementation can be found on GitHub at https://github.com/dacuevas/PMAnalyzerWeb . dcuevas08@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Simple proteomics data analysis in the object-oriented PowerShell.
Mohammed, Yassene; Palmblad, Magnus
2013-01-01
Scripting languages such as Perl and Python are appreciated for solving simple, everyday tasks in bioinformatics. A more recent, object-oriented command shell and scripting language, Windows PowerShell, has many attractive features: an object-oriented interactive command line, fluent navigation and manipulation of XML files, ability to consume Web services from the command line, consistent syntax and grammar, rich regular expressions, and advanced output formatting. The key difference between classical command shells and scripting languages, such as bash, and object-oriented ones, such as PowerShell, is that in the latter the result of a command is a structured object with inherited properties and methods rather than a simple stream of characters. Conveniently, PowerShell is included in all new releases of Microsoft Windows and therefore already installed on most computers in classrooms and teaching labs. In this chapter we demonstrate how PowerShell in particular allows easy interaction with mass spectrometry data in XML formats, connection to Web services for tools such as BLAST, and presentation of results as formatted text or graphics. These features make PowerShell much more than "yet another scripting language."
UBioLab: a web-LABoratory for Ubiquitous in-silico experiments.
Bartocci, E; Di Berardini, M R; Merelli, E; Vito, L
2012-03-01
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists -for what concerns their management and visualization- and for bioinformaticians -for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle -and possibly to handle in a transparent and uniform way- aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features -as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques- give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application
2015-01-01
Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471
MetaboAnalystR: an R package for flexible and reproducible analysis of metabolomics data.
Chong, Jasmine; Xia, Jianguo
2018-06-28
The MetaboAnalyst web application has been widely used for metabolomics data analysis and interpretation. Despite its user-friendliness, the web interface has presented its inherent limitations (especially for advanced users) with regard to flexibility in creating customized workflow, support for reproducible analysis, and capacity in dealing with large data. To address these limitations, we have developed a companion R package (MetaboAnalystR) based on the R code base of the web server. The package has been thoroughly tested to ensure that the same R commands will produce identical results from both interfaces. MetaboAnalystR complements the MetaboAnalyst web server to facilitate transparent, flexible and reproducible analysis of metabolomics data. MetaboAnalystR is freely available from https://github.com/xia-lab/MetaboAnalystR. Supplementary data are available at Bioinformatics online.
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Huang, Yi-Fei; Golding, G Brian
2015-02-15
A number of statistical phylogenetic methods have been developed to infer conserved functional sites or regions in proteins. Many methods, e.g. Rate4Site, apply the standard phylogenetic models to infer site-specific substitution rates and totally ignore the spatial correlation of substitution rates in protein tertiary structures, which may reduce their power to identify conserved functional patches in protein tertiary structures when the sequences used in the analysis are highly similar. The 3D sliding window method has been proposed to infer conserved functional patches in protein tertiary structures, but the window size, which reflects the strength of the spatial correlation, must be predefined and is not inferred from data. We recently developed GP4Rate to solve these problems under the Bayesian framework. Unfortunately, GP4Rate is computationally slow. Here, we present an intuitive web server, FuncPatch, to perform a fast approximate Bayesian inference of conserved functional patches in protein tertiary structures. Both simulations and four case studies based on empirical data suggest that FuncPatch is a good approximation to GP4Rate. However, FuncPatch is orders of magnitudes faster than GP4Rate. In addition, simulations suggest that FuncPatch is potentially a useful tool complementary to Rate4Site, but the 3D sliding window method is less powerful than FuncPatch and Rate4Site. The functional patches predicted by FuncPatch in the four case studies are supported by experimental evidence, which corroborates the usefulness of FuncPatch. The software FuncPatch is freely available at the web site, http://info.mcmaster.ca/yifei/FuncPatch golding@mcmaster.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
MEMOSys: Bioinformatics platform for genome-scale metabolic models
2011-01-01
Background Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models. Results MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models. Conclusions We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at http://www.icbi.at/MEMOSys. PMID:21276275
Nebula--a web-server for advanced ChIP-seq data analysis.
Boeva, Valentina; Lermine, Alban; Barette, Camille; Guillouf, Christel; Barillot, Emmanuel
2012-10-01
ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3-5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Nebula is available at http://nebula.curie.fr/ Supplementary data are available at Bioinformatics online.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
2016-06-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; van Oven, Mannis; Wallace, Douglas C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F.; Attimonelli, Marcella; Zuchner, Stephan
2016-01-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060
Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas
2014-01-01
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases
Riadi, Gonzalo; Medina-Moenne, Cristobal; Holmes, David S.
2012-01-01
Transposases (Tnps) are enzymes that participate in the movement of insertion sequences (ISs) within and between genomes. Genes that encode Tnps are amongst the most abundant and widely distributed genes in nature. However, they are difficult to predict bioinformatically and given the increasing availability of prokaryotic genomes and metagenomes, it is incumbent to develop rapid, high quality automatic annotation of ISs. This need prompted us to develop a web service, termed TnpPred for Tnp discovery. It provides better sensitivity and specificity for Tnp predictions than given by currently available programs as determined by ROC analysis. TnpPred should be useful for improving genome annotation. The TnpPred web service is freely available for noncommercial use. PMID:23251097
Predicting protein crystallization propensity from protein sequence
2011-01-01
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794
ERIC Educational Resources Information Center
Zhang, Xiaorong
2011-01-01
We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…
ERIC Educational Resources Information Center
Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
2014-01-01
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…
XML schemas for common bioinformatic data types and their application in workflow systems
Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert
2006-01-01
Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823
OfftargetFinder: a web tool for species-specific RNAi design.
Good, R T; Varghese, T; Golz, J F; Russell, D A; Papanicolaou, A; Edwards, O; Robin, C
2016-04-15
RNA interference (RNAi) technology is being developed as a weapon for pest insect control. To maximize the specificity that such an approach affords we have developed a bioinformatic web tool that searches the ever-growing arthropod transcriptome databases so that pest-specific RNAi sequences can be identified. This will help technology developers finesse the design of RNAi sequences and suggests which non-target species should be assessed in the risk assessment process. http://rnai.specifly.org crobin@unimelb.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.
Ambrosini, Giovanna; Groux, Romain; Bucher, Philipp
2018-03-05
Transcription factors (TFs) regulate gene expression by binding to specific short DNA sequences of 5 to 20-bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or TF binding site model from a public database. The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively. giovanna.ambrosini@epfl.ch. SUPPLEMENTARY DATA ARE AVAILABLE AT BIOINFORMATICS ONLINE.
McDowell, Graeme S. V.; Taylor, Graeme P.; Fai, Stephen; Bennett, Steffany A. L.
2014-01-01
The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5′-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site. PMID:24701584
McDowell, Graeme S V; Blanchard, Alexandre P; Taylor, Graeme P; Figeys, Daniel; Fai, Stephen; Bennett, Steffany A L
2014-01-01
The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5'-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site.
Romano, Paolo; Manniello, Assunta; Aresu, Ottavia; Armento, Massimiliano; Cesaro, Michela; Parodi, Barbara
2009-01-01
The Cell Line Data Base (CLDB) is a well-known reference information source on human and animal cell lines including information on more than 6000 cell lines. Main biological features are coded according to controlled vocabularies derived from international lists and taxonomies. HyperCLDB (http://bioinformatics.istge.it/hypercldb/) is a hypertext version of CLDB that improves data accessibility by also allowing information retrieval through web spiders. Access to HyperCLDB is provided through indexes of biological characteristics and navigation in the hypertext is granted by many internal links. HyperCLDB also includes links to external resources. Recently, an interest was raised for a reference nomenclature for cell lines and CLDB was seen as an authoritative system. Furthermore, to overcome the cell line misidentification problem, molecular authentication methods, such as fingerprinting, single-locus short tandem repeat (STR) profile and single nucleotide polymorphisms validation, were proposed. Since this data is distributed, a reference portal on authentication of human cell lines is needed. We present here the architecture and contents of CLDB, its recent enhancements and perspectives. We also present a new related database, the Cell Line Integrated Molecular Authentication (CLIMA) database (http://bioinformatics.istge.it/clima/), that allows to link authentication data to actual cell lines. PMID:18927105
Romano, Paolo; Manniello, Assunta; Aresu, Ottavia; Armento, Massimiliano; Cesaro, Michela; Parodi, Barbara
2009-01-01
The Cell Line Data Base (CLDB) is a well-known reference information source on human and animal cell lines including information on more than 6000 cell lines. Main biological features are coded according to controlled vocabularies derived from international lists and taxonomies. HyperCLDB (http://bioinformatics.istge.it/hypercldb/) is a hypertext version of CLDB that improves data accessibility by also allowing information retrieval through web spiders. Access to HyperCLDB is provided through indexes of biological characteristics and navigation in the hypertext is granted by many internal links. HyperCLDB also includes links to external resources. Recently, an interest was raised for a reference nomenclature for cell lines and CLDB was seen as an authoritative system. Furthermore, to overcome the cell line misidentification problem, molecular authentication methods, such as fingerprinting, single-locus short tandem repeat (STR) profile and single nucleotide polymorphisms validation, were proposed. Since this data is distributed, a reference portal on authentication of human cell lines is needed. We present here the architecture and contents of CLDB, its recent enhancements and perspectives. We also present a new related database, the Cell Line Integrated Molecular Authentication (CLIMA) database (http://bioinformatics.istge.it/clima/), that allows to link authentication data to actual cell lines.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.
Aranguren, Mikel Egaña; Wilkinson, Mark D
2015-01-01
Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services. This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration. The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
Beveridge, Allan
2006-01-01
The Internet consists of a vast inhomogeneous reservoir of data. Developing software that can integrate a wide variety of different data sources is a major challenge that must be addressed for the realisation of the full potential of the Internet as a scientific research tool. This article presents a semi-automated object-oriented programming system for integrating web-based resources. We demonstrate that the current Internet standards (HTML, CGI [common gateway interface], Java, etc.) can be exploited to develop a data retrieval system that scans existing web interfaces and then uses a set of rules to generate new Java code that can automatically retrieve data from the Web. The validity of the software has been demonstrated by testing it on several biological databases. We also examine the current limitations of the Internet and discuss the need for the development of universal standards for web-based data.
Cytoscape tools for the web age: D3.js and Cytoscape.js exporters
Ono, Keiichiro; Demchak, Barry; Ideker, Trey
2014-01-01
In this paper we present new data export modules for Cytoscape 3 that can generate network files for Cytoscape.js and D3.js. Cytoscape.js exporter is implemented as a core feature of Cytoscape 3, and D3.js exporter is available as a Cytoscape 3 app. These modules enable users to seamlessly export network and table data sets generated in Cytoscape to popular JavaScript library readable formats. In addition, we implemented template web applications for browser-based interactive network visualization that can be used as basis for complex data visualization applications for bioinformatics research. Example web applications created with these tools demonstrate how Cytoscape works in modern data visualization workflows built with traditional desktop tools and emerging web-based technologies. This interactivity enables researchers more flexibility than with static images, thereby greatly improving the quality of insights researchers can gain from them. PMID:25520778
Cytoscape tools for the web age: D3.js and Cytoscape.js exporters.
Ono, Keiichiro; Demchak, Barry; Ideker, Trey
2014-01-01
In this paper we present new data export modules for Cytoscape 3 that can generate network files for Cytoscape.js and D3.js. Cytoscape.js exporter is implemented as a core feature of Cytoscape 3, and D3.js exporter is available as a Cytoscape 3 app. These modules enable users to seamlessly export network and table data sets generated in Cytoscape to popular JavaScript library readable formats. In addition, we implemented template web applications for browser-based interactive network visualization that can be used as basis for complex data visualization applications for bioinformatics research. Example web applications created with these tools demonstrate how Cytoscape works in modern data visualization workflows built with traditional desktop tools and emerging web-based technologies. This interactivity enables researchers more flexibility than with static images, thereby greatly improving the quality of insights researchers can gain from them.
DelPhiForce web server: electrostatic forces and energy calculations and visualization.
Li, Lin; Jia, Zhe; Peng, Yunhui; Chakravorty, Arghya; Sun, Lexuan; Alexov, Emil
2017-11-15
Electrostatic force is an essential component of the total force acting between atoms and macromolecules. Therefore, accurate calculations of electrostatic forces are crucial for revealing the mechanisms of many biological processes. We developed a DelPhiForce web server to calculate and visualize the electrostatic forces at molecular level. DelPhiForce web server enables modeling of electrostatic forces on individual atoms, residues, domains and molecules, and generates an output that can be visualized by VMD software. Here we demonstrate the usage of the server for various biological problems including protein-cofactor, domain-domain, protein-protein, protein-DNA and protein-RNA interactions. The DelPhiForce web server is available at: http://compbio.clemson.edu/delphi-force. delphi@clemson.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
2010-01-01
Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. Conclusions RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field. PMID:20459631
Popenda, Mariusz; Szachniuk, Marta; Blazewicz, Marek; Wasik, Szymon; Burke, Edmund K; Blazewicz, Jacek; Adamiak, Ryszard W
2010-05-06
Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field.
GeneFisher-P: variations of GeneFisher as processes in Bio-jETI
Lamprecht, Anna-Lena; Margaria, Tiziana; Steffen, Bernhard; Sczyrba, Alexander; Hartmeier, Sven; Giegerich, Robert
2008-01-01
Background PCR primer design is an everyday, but not trivial task requiring state-of-the-art software. We describe the popular tool GeneFisher and explain its recent restructuring using workflow techniques. We apply a service-oriented approach to model and implement GeneFisher-P, a process-based version of the GeneFisher web application, as a part of the Bio-jETI platform for service modeling and execution. We show how to introduce a flexible process layer to meet the growing demand for improved user-friendliness and flexibility. Results Within Bio-jETI, we model the process using the jABC framework, a mature model-driven, service-oriented process definition platform. We encapsulate remote legacy tools and integrate web services using jETI, an extension of the jABC for seamless integration of remote resources as basic services, ready to be used in the process. Some of the basic services used by GeneFisher are in fact already provided as individual web services at BiBiServ and can be directly accessed. Others are legacy programs, and are made available to Bio-jETI via the jETI technology. The full power of service-based process orientation is required when more bioinformatics tools, available as web services or via jETI, lead to easy extensions or variations of the basic process. This concerns for instance variations of data retrieval or alignment tools as provided by the European Bioinformatics Institute (EBI). Conclusions The resulting service- and process-oriented GeneFisher-P demonstrates how basic services from heterogeneous sources can be easily orchestrated in the Bio-jETI platform and lead to a flexible family of specialized processes tailored to specific tasks. PMID:18460174
UBioLab: a web-laboratory for ubiquitous in-silico experiments.
Bartocci, Ezio; Cacciagrano, Diletta; Di Berardini, Maria Rita; Merelli, Emanuela; Vito, Leonardo
2012-07-09
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists –for what concerns their management and visualization– and for bioinformaticians –for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle –and possibly to handle in a transparent and uniform way– aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features –as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques– give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
StructRNAfinder: an automated pipeline and web server for RNA families prediction.
Arias-Carrasco, Raúl; Vásquez-Morán, Yessenia; Nakaya, Helder I; Maracaja-Coutinho, Vinicius
2018-02-17
The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me . The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration.
Reactome Pengine: A web-logic API to the homo sapiens reactome.
Neaves, Samuel R; Tsoka, Sophia; Millard, Louise A C
2018-03-30
Existing ways of accessing data from the Reactome database are limited. Either a researcher is restricted to particular queries defined by a web application programming interface (API), or they have to download the whole database. Reactome Pengine is a web service providing a logic programming based API to the human reactome. This gives researchers greater flexibility in data access than existing APIs, as users can send their own small programs (alongside queries) to Reactome Pengine. The server and an example notebook can be found at https://apps.nms.kcl.ac.uk/reactome-pengine. Source code is available at https://github.com/samwalrus/reactome-pengine and a Docker image is available at https://hub.docker.com/r/samneaves/rp4/ . samuel.neaves@kcl.ac.uk. Supplementary data are available at Bioinformatics online.
High-throughput biological techniques, like microarrays and drug screens, generate an enormous amount of data that may be critically important for cancer researchers and clinicians. Being able to manipulate the data to extract those pieces of interest, however, can require computational or bioinformatics skills beyond those of the average scientist.
Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A
2004-11-01
RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html
NMRPro: an integrated web component for interactive processing and visualization of NMR spectra.
Mohamed, Ahmed; Nguyen, Canh Hao; Mamitsuka, Hiroshi
2016-07-01
The popularity of using NMR spectroscopy in metabolomics and natural products has driven the development of an array of NMR spectral analysis tools and databases. Particularly, web applications are well used recently because they are platform-independent and easy to extend through reusable web components. Currently available web applications provide the analysis of NMR spectra. However, they still lack the necessary processing and interactive visualization functionalities. To overcome these limitations, we present NMRPro, a web component that can be easily incorporated into current web applications, enabling easy-to-use online interactive processing and visualization. NMRPro integrates server-side processing with client-side interactive visualization through three parts: a python package to efficiently process large NMR datasets on the server-side, a Django App managing server-client interaction, and SpecdrawJS for client-side interactive visualization. Demo and installation instructions are available at http://mamitsukalab.org/tools/nmrpro/ mohamed@kuicr.kyoto-u.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia
2016-01-01
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis. PMID:27792763
AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms.
Hosny, Abdelrahman; Vera-Licona, Paola; Laubenbacher, Reinhard; Favre, Thibauld
2016-08-01
There is a growing need in bioinformatics for easy-to-use software implementations of algorithms that are usable across platforms. At the same time, reproducibility of computational results is critical and often a challenge due to source code changes over time and dependencies. The approach introduced in this paper addresses both of these needs with AlgoRun, a dedicated packaging system for implemented algorithms, using Docker technology. Implemented algorithms, packaged with AlgoRun, can be executed through a user-friendly interface directly from a web browser or via a standardized RESTful web API to allow easy integration into more complex workflows. The packaged algorithm includes the entire software execution environment, thereby eliminating the common problem of software dependencies and the irreproducibility of computations over time. AlgoRun-packaged algorithms can be published on http://algorun.org, a centralized searchable directory to find existing AlgoRun-packaged algorithms. AlgoRun is available at http://algorun.org and the source code under GPL license is available at https://github.com/algorun laubenbacher@uchc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Chen, Shi-Yi; Deng, Feilong; Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia
2016-01-01
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.
myExperiment: a repository and social network for the sharing of bioinformatics workflows
Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David
2010-01-01
myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org. PMID:20501605
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN
Merchant, Nirav
2016-01-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
Razban, Rostam M; Gilson, Amy I; Durfee, Niamh; Strobelt, Hendrik; Dinkla, Kasper; Choi, Jeong-Mo; Pfister, Hanspeter; Shakhnovich, Eugene I
2018-05-08
Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the S. cerevisiae and E. coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S. cerevisiae and E. coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution (Dokholyan et al., 2002). Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (p-value<10-10) and -0.46 (p-value<10-10) for S. cerevisiae and E. coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant (Zhang and Yang, 2015). ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary data are available at Bioinformatics. shakhnovich@chemistry.harvard.edu.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.
Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P
2016-04-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Franklin, Erik C; Stat, Michael; Pochon, Xavier; Putnam, Hollie M; Gates, Ruth D
2012-03-01
The genus Symbiodinium encompasses a group of unicellular, photosynthetic dinoflagellates that are found free living or in hospite with a wide range of marine invertebrate hosts including scleractinian corals. We present GeoSymbio, a hybrid web application that provides an online, easy to use and freely accessible interface for users to discover, explore and utilize global geospatial bioinformatic and ecoinformatic data on Symbiodinium-host symbioses. The novelty of this application lies in the combination of a variety of query and visualization tools, including dynamic searchable maps, data tables with filter and grouping functions, and interactive charts that summarize the data. Importantly, this application is hosted remotely or 'in the cloud' using Google Apps, and therefore does not require any specialty GIS, web programming or data programming expertise from the user. The current version of the application utilizes Symbiodinium data based on the ITS2 genetic marker from PCR-based techniques, including denaturing gradient gel electrophoresis, sequencing and cloning of specimens collected during 1982-2010. All data elements of the application are also downloadable as spatial files, tables and nucleic acid sequence files in common formats for desktop analysis. The application provides a unique tool set to facilitate research on the basic biology of Symbiodinium and expedite new insights into their ecology, biogeography and evolution in the face of a changing global climate. GeoSymbio can be accessed at https://sites.google.com/site/geosymbio/. © 2011 Blackwell Publishing Ltd.
XML schemas for common bioinformatic data types and their application in workflow systems.
Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert
2006-11-06
Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Revealing biological information using data structuring and automated learning.
Mohorianu, Irina; Moulton, Vincent
2010-11-01
The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.
Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
2014-01-01
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484
The structural bioinformatics library: modeling in biomolecular science and beyond.
Cazals, Frédéric; Dreyfus, Tom
2017-04-01
Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. The SBL is available from http://sbl.inria.fr. Frederic.Cazals@inria.fr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Analysis of galactosemia-linked mutations of GALT enzyme using a computational biology approach.
Facchiano, A; Marabotti, A
2010-02-01
We describe the prediction of the structural and functional effects of mutations on the enzyme galactose-1-phosphate uridyltransferase related to the genetic disease galactosemia, using a fully computational approach. One hundred and seven single-point mutants were simulated starting from the structural model of the enzyme obtained by homology modeling methods. Several bioinformatics programs were then applied to each resulting mutant protein to analyze the effect of the mutations. The mutations have a direct effect on the active site, or on the dimer assembly and stability, or on the monomer stability. We describe how mutations may exert their effect at a molecular level by altering H-bonds, salt bridges, secondary structure or surface features. The alteration of protein stability, at level of monomer and/or dimer, is the main effect observed. We found an agreement between our results and the functional experimental data available in literature for some mutants. The data and analyses for all the mutants are fully available in the web-accessible database hosted at http://bioinformatica.isa.cnr.it/GALT.
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
Neuro-symbolic representation learning on biological knowledge graphs.
Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert
2017-09-01
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. https://github.com/bio-ontology-research-group/walking-rdf-and-owl. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Nelson, Justin; Simpkins, Scott W; Safizadeh, Hamid; Li, Sheena C; Piotrowski, Jeff S; Hirano, Hiroyuki; Yashiroda, Yoko; Osada, Hiroyuki; Yoshida, Minoru; Boone, Charles; Myers, Chad L
2018-04-01
Chemical-genomic approaches that map interactions between small molecules and genetic perturbations offer a promising strategy for functional annotation of uncharacterized bioactive compounds. We recently developed a new high-throughput platform for mapping chemical-genetic (CG) interactions in yeast that can be scaled to screen large compound collections, and we applied this system to generate CG interaction profiles for more than 13 000 compounds. When integrated with the existing global yeast genetic interaction network, CG interaction profiles can enable mode-of-action prediction for previously uncharacterized compounds as well as discover unexpected secondary effects for known drugs. To facilitate future analysis of these valuable data, we developed a public database and web interface named MOSAIC. The website provides a convenient interface for querying compounds, bioprocesses (Gene Ontology terms) and genes for CG information including direct CG interactions, bioprocesses and gene-level target predictions. MOSAIC also provides access to chemical structure information of screened molecules, chemical-genomic profiles and the ability to search for compounds sharing structural and functional similarity. This resource will be of interest to chemical biologists for discovering new small molecule probes with specific modes-of-action as well as computational biologists interested in analysing CG interaction networks. MOSAIC is available at http://mosaic.cs.umn.edu. hisyo@riken.jp, yoshidam@riken.jp, charlie.boone@utoronto.ca or chadm@umn.edu. Supplementary data are available at Bioinformatics online.
iMetaLab 1.0: A web platform for metaproteomics data analysis.
Liao, Bo; Ning, Zhibin; Cheng, Kai; Zhang, Xu; Li, Leyuan; Mayne, Janice; Figeys, Daniel
2018-06-15
The human gut microbiota, a complex, dynamic and biodiverse community, has been increasingly shown to influence many aspects of health and disease. Metaproteomic analysis has proven to be a powerful approach to study the functionality of the microbiota. However, the processing and analyses of metaproteomic mass spectrometry (MS) data remains a daunting task in metaproteomics data analysis. We developed iMetaLab, a web based platform to provide a user-friendly and comprehensive data analysis pipeline with a focus on lowering the technical barrier for metaproteomics data analysis. iMetaLab is freely available through at http://imetalab.ca. Supplementary data are available at Bioinformatics online.
LocalMove: computing on-lattice fits for biopolymers
Ponty, Y.; Istrate, R.; Porcelli, E.; Clote, P.
2008-01-01
Given an input Protein Data Bank file (PDB) for a protein or RNA molecule, LocalMove is a web server that determines an on-lattice representation for the input biomolecule. The web server implements a Markov Chain Monte-Carlo algorithm with simulated annealing to compute an approximate fit for either the coarse-grain model or backbone model on either the cubic or face-centered cubic lattice. LocalMove returns a PDB file as output, as well as dynamic movie of 3D images of intermediate conformations during the computation. The LocalMove server is publicly available at http://bioinformatics.bc.edu/clotelab/localmove/. PMID:18556754
Cui, Ya; Chen, Xiaowei; Luo, Huaxia; Fan, Zhen; Luo, Jianjun; He, Shunmin; Yue, Haiyan; Zhang, Peng; Chen, Runsheng
2016-06-01
We here present BioCircos.js, an interactive and lightweight JavaScript library especially for biological data interactive visualization. BioCircos.js facilitates the development of web-based applications for circular visualization of various biological data, such as genomic features, genetic variations, gene expression and biomolecular interactions. BioCircos.js and its manual are freely available online at http://bioinfo.ibp.ac.cn/biocircos/ rschen@ibp.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.
Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin; Choi, In Young
2016-03-01
Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA.
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Prospects and limitations of full-text index structures in genome analysis
Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter
2012-01-01
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621
Data mining in newt-omics, the repository for omics data from the newt.
Looso, Mario; Braun, Thomas
2015-01-01
Salamanders are an excellent model organism to study regenerative processes due to their unique ability to regenerate lost appendages or organs. Straightforward bioinformatics tools to analyze and take advantage of the growing number of "omics" studies performed in salamanders were lacking so far. To overcome this limitation, we have generated a comprehensive data repository for the red-spotted newt Notophthalmus viridescens, named newt-omics, merging omics style datasets on the transcriptome and proteome level including expression values and annotations. The resource is freely available via a user-friendly Web-based graphical user interface ( http://newt-omics.mpi-bn.mpg.de) that allows access and queries to the database without prior bioinformatical expertise. The repository is updated regularly, incorporating new published datasets from omics technologies.
SoS Notebook: An Interactive Multi-Language Data Analysis Environment.
Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N
2018-05-22
Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba
2018-01-19
The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.
Ghandikota, Sudhir; Hershey, Gurjit K Khurana; Mersha, Tesfaye B
2018-03-24
Advances in high-throughput sequencing technologies have made it possible to generate multiple omics data at an unprecedented rate and scale. The accumulation of these omics data far outpaces the rate at which biologists can mine and generate new hypothesis to test experimentally. There is an urgent need to develop a myriad of powerful tools to efficiently and effectively search and filter these resources to address specific post-GWAS functional genomics questions. However, to date, these resources are scattered across several databases and often lack a unified portal for data annotation and analytics. In addition, existing tools to analyze and visualize these databases are highly fragmented, resulting researchers to access multiple applications and manual interventions for each gene or variant in an ad hoc fashion until all the questions are answered. In this study, we present GENEASE, a web-based one-stop bioinformatics tool designed to not only query and explore multi-omics and phenotype databases (e.g., GTEx, ClinVar, dbGaP, GWAS Catalog, ENCODE, Roadmap Epigenomics, KEGG, Reactome, Gene and Phenotype Ontology) in a single web interface but also to perform seamless post genome-wide association downstream functional and overlap analysis for non-coding regulatory variants. GENEASE accesses over 50 different databases in public domain including model organism-specific databases to facilitate gene/variant and disease exploration, enrichment and overlap analysis in real time. It is a user-friendly tool with point-and-click interface containing links for support information including user manual and examples. GENEASE can be accessed freely at http://research.cchmc.org/mershalab/genease_new/login.html. Tesfaye.Mersha@cchmc.org, Sudhir.Ghandikota@cchmc.org. Supplementary data are available at Bioinformatics online.
ERIC Educational Resources Information Center
Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.
2004-01-01
We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…
Cheminformatics and the Semantic Web: adding value with linked data and enhanced provenance
Frey, Jeremy G; Bird, Colin L
2013-01-01
Cheminformatics is evolving from being a field of study associated primarily with drug discovery into a discipline that embraces the distribution, management, access, and sharing of chemical data. The relationship with the related subject of bioinformatics is becoming stronger and better defined, owing to the influence of Semantic Web technologies, which enable researchers to integrate heterogeneous sources of chemical, biochemical, biological, and medical information. These developments depend on a range of factors: the principles of chemical identifiers and their role in relationships between chemical and biological entities; the importance of preserving provenance and properly curated metadata; and an understanding of the contribution that the Semantic Web can make at all stages of the research lifecycle. The movements toward open access, open source, and open collaboration all contribute to progress toward the goals of integration. PMID:24432050
Wolff, Joachim; Bhardwaj, Vivek; Nothjunge, Stephan; Richard, Gautier; Renschler, Gina; Gilsbach, Ralf; Manke, Thomas; Backofen, Rolf; Ramírez, Fidel; Grüning, Björn A
2018-06-13
Galaxy HiCExplorer is a web server that facilitates the study of the 3D conformation of chromatin by allowing Hi-C data processing, analysis and visualization. With the Galaxy HiCExplorer web server, users with little bioinformatic background can perform every step of the analysis in one workflow: mapping of the raw sequence data, creation of Hi-C contact matrices, quality assessment, correction of contact matrices and identification of topological associated domains (TADs) and A/B compartments. Users can create publication ready plots of the contact matrix, A/B compartments, and TADs on a selected genomic locus, along with additional information like gene tracks or ChIP-seq signals. Galaxy HiCExplorer is freely usable at: https://hicexplorer.usegalaxy.eu and is available as a Docker container: https://github.com/deeptools/docker-galaxy-hicexplorer.
Analysis Tool Web Services from the EMBL-EBI.
McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Cowley, Andrew Peter; Lopez, Rodrigo
2013-07-01
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
Analysis Tool Web Services from the EMBL-EBI
McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Cowley, Andrew Peter; Lopez, Rodrigo
2013-01-01
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods. PMID:23671338
Graphite Web: web tool for gene set analysis exploiting pathway topology
Sales, Gabriele; Calura, Enrica; Martini, Paolo; Romualdi, Chiara
2013-01-01
Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills. Here we present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases. Graphite Web is freely available at http://graphiteweb.bio.unipd.it/. PMID:23666626
Robust enzyme design: bioinformatic tools for improved protein stability.
Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas
2015-03-01
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Workflows for microarray data processing in the Kepler environment.
Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark
2012-05-17
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
LXtoo: an integrated live Linux distribution for the bioinformatics community
2012-01-01
Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356
LXtoo: an integrated live Linux distribution for the bioinformatics community.
Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu
2012-07-19
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
EVpedia: a community web portal for extracellular vesicles research.
Kim, Dae-Kyum; Lee, Jaewook; Kim, Sae Rom; Choi, Dong-Sic; Yoon, Yae Jin; Kim, Ji Hyun; Go, Gyeongyun; Nhung, Dinh; Hong, Kahye; Jang, Su Chul; Kim, Si-Hyun; Park, Kyong-Su; Kim, Oh Youn; Park, Hyun Taek; Seo, Ji Hye; Aikawa, Elena; Baj-Krzyworzeka, Monika; van Balkom, Bas W M; Belting, Mattias; Blanc, Lionel; Bond, Vincent; Bongiovanni, Antonella; Borràs, Francesc E; Buée, Luc; Buzás, Edit I; Cheng, Lesley; Clayton, Aled; Cocucci, Emanuele; Dela Cruz, Charles S; Desiderio, Dominic M; Di Vizio, Dolores; Ekström, Karin; Falcon-Perez, Juan M; Gardiner, Chris; Giebel, Bernd; Greening, David W; Gross, Julia Christina; Gupta, Dwijendra; Hendrix, An; Hill, Andrew F; Hill, Michelle M; Nolte-'t Hoen, Esther; Hwang, Do Won; Inal, Jameel; Jagannadham, Medicharla V; Jayachandran, Muthuvel; Jee, Young-Koo; Jørgensen, Malene; Kim, Kwang Pyo; Kim, Yoon-Keun; Kislinger, Thomas; Lässer, Cecilia; Lee, Dong Soo; Lee, Hakmo; van Leeuwen, Johannes; Lener, Thomas; Liu, Ming-Lin; Lötvall, Jan; Marcilla, Antonio; Mathivanan, Suresh; Möller, Andreas; Morhayim, Jess; Mullier, François; Nazarenko, Irina; Nieuwland, Rienk; Nunes, Diana N; Pang, Ken; Park, Jaesung; Patel, Tushar; Pocsfalvi, Gabriella; Del Portillo, Hernando; Putz, Ulrich; Ramirez, Marcel I; Rodrigues, Marcio L; Roh, Tae-Young; Royo, Felix; Sahoo, Susmita; Schiffelers, Raymond; Sharma, Shivani; Siljander, Pia; Simpson, Richard J; Soekmadji, Carolina; Stahl, Philip; Stensballe, Allan; Stępień, Ewa; Tahara, Hidetoshi; Trummer, Arne; Valadi, Hadi; Vella, Laura J; Wai, Sun Nyunt; Witwer, Kenneth; Yáñez-Mó, María; Youn, Hyewon; Zeidler, Reinhard; Gho, Yong Song
2015-03-15
Extracellular vesicles (EVs) are spherical bilayered proteolipids, harboring various bioactive molecules. Due to the complexity of the vesicular nomenclatures and components, online searches for EV-related publications and vesicular components are currently challenging. We present an improved version of EVpedia, a public database for EVs research. This community web portal contains a database of publications and vesicular components, identification of orthologous vesicular components, bioinformatic tools and a personalized function. EVpedia includes 6879 publications, 172 080 vesicular components from 263 high-throughput datasets, and has been accessed more than 65 000 times from more than 750 cities. In addition, about 350 members from 73 international research groups have participated in developing EVpedia. This free web-based database might serve as a useful resource to stimulate the emerging field of EV research. The web site was implemented in PHP, Java, MySQL and Apache, and is freely available at http://evpedia.info. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
GSCALite: A Web Server for Gene Set Cancer Analysis.
Liu, Chun-Jie; Hu, Fei-Fei; Xia, Mengxuan; Han, Leng; Zhang, Qiong; Guo, An-Yuan
2018-05-22
The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. guoay@hust.edu.cn or zhangqiong@hust.edu.cn. Supplementary data are available at Bioinformatics online.
Computational approaches to define a human milk metaglycome
Agravat, Sanjay B.; Song, Xuezheng; Rojsajjakul, Teerapat; Cummings, Richard D.; Smith, David F.
2016-01-01
Motivation: The goal of deciphering the human glycome has been hindered by the lack of high-throughput sequencing methods for glycans. Although mass spectrometry (MS) is a key technology in glycan sequencing, MS alone provides limited information about the identification of monosaccharide constituents, their anomericity and their linkages. These features of individual, purified glycans can be partly identified using well-defined glycan-binding proteins, such as lectins and antibodies that recognize specific determinants within glycan structures. Results: We present a novel computational approach to automate the sequencing of glycans using metadata-assisted glycan sequencing, which combines MS analyses with glycan structural information from glycan microarray technology. Success in this approach was aided by the generation of a ‘virtual glycome’ to represent all potential glycan structures that might exist within a metaglycomes based on a set of biosynthetic assumptions using known structural information. We exploited this approach to deduce the structures of soluble glycans within the human milk glycome by matching predicted structures based on experimental data against the virtual glycome. This represents the first meta-glycome to be defined using this method and we provide a publically available web-based application to aid in sequencing milk glycans. Availability and implementation: http://glycomeseq.emory.edu Contact: sagravat@bidmc.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26803164
DAVID-WS: a stateful web service to facilitate gene/protein list analysis
Jiao, Xiaoli; Sherman, Brad T.; Huang, Da Wei; Stephens, Robert; Baseler, Michael W.; Lane, H. Clifford; Lempicki, Richard A.
2012-01-01
Summary: The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. Availability: The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html. Contact: xiaoli.jiao@nih.gov; rlempicki@nih.gov PMID:22543366
TreeVector: scalable, interactive, phylogenetic trees for the web.
Pethica, Ralph; Barker, Gary; Kovacs, Tim; Gough, Julian
2010-01-28
Phylogenetic trees are complex data forms that need to be graphically displayed to be human-readable. Traditional techniques of plotting phylogenetic trees focus on rendering a single static image, but increases in the production of biological data and large-scale analyses demand scalable, browsable, and interactive trees. We introduce TreeVector, a Scalable Vector Graphics-and Java-based method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required, and can be modified and linked using standard web technologies. There are now many bioinformatics servers and databases with a range of dynamic processes and updates to cope with the increasing volume of data. TreeVector is designed as a framework to integrate with these processes and produce user-customized phylogenies automatically. We also address the strengths of phylogenetic trees as part of a linked-in browsing process rather than an end graphic for print. TreeVector is fast and easy to use and is available to download precompiled, but is also open source. It can also be run from the web server listed below or the user's own web server. It has already been deployed on two recognized and widely used database Web sites.
DAVID-WS: a stateful web service to facilitate gene/protein list analysis.
Jiao, Xiaoli; Sherman, Brad T; Huang, Da Wei; Stephens, Robert; Baseler, Michael W; Lane, H Clifford; Lempicki, Richard A
2012-07-01
The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html.
PIQMIe: a web server for semi-quantitative proteomics data management and analysis
Kuzniar, Arnold; Kanaar, Roland
2014-01-01
We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. PMID:24861615
PIQMIe: a web server for semi-quantitative proteomics data management and analysis.
Kuzniar, Arnold; Kanaar, Roland
2014-07-01
We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
eMolTox: prediction of molecular toxicity with confidence.
Ji, Changge; Svensson, Fredrik; Zoufir, Azedine; Bender, Andreas
2018-03-07
In this work we present eMolTox, a web server for the prediction of potential toxicity associated with a given molecule. 174 toxicology-related in vitro/vivo experimental datasets were used for model construction and Mondrian conformal prediction was used to estimate the confidence of the resulting predictions. Toxic substructure analysis is also implemented in eMolTox. eMolTox predicts and displays a wealth of information of potential molecular toxicities for safety analysis in drug development. The eMolTox Server is freely available for use on the web at http://xundrug.cn/moltox. chicago.ji@gmail.com or ab454@cam.ac.uk. Supplementary data are available at Bioinformatics online.
A new online computational biology curriculum.
Searls, David B
2014-06-01
A recent proliferation of Massive Open Online Courses (MOOCs) and other web-based educational resources has greatly increased the potential for effective self-study in many fields. This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice.
Eckart, J Dana; Sobral, Bruno W S
2003-01-01
The emergent needs of the bioinformatics community challenge current information systems. The pace of biological data generation far outstrips Moore's Law. Therefore, a gap continues to widen between the capabilities to produce biological (molecular and cell) data sets and the capability to manage and analyze these data sets. As a result, Federal investments in large data set generation produces diminishing returns in terms of the community's capabilities of understanding biology and leveraging that understanding to make scientific and technological advances that improve society. We are building an open framework to address various data management issues including data and tool interoperability, nomenclature and data communication standardization, and database integration. PathPort, short for Pathogen Portal, employs a generic, web-services based framework to deal with some of the problems identified by the bioinformatics community. The motivating research goal of a scalable system to provide data management and analysis for key pathosystems, especially relating to molecular data, has resulted in a generic framework using two major components. On the server-side, we employ web-services. On the client-side, a Java application called ToolBus acts as a client-side "bus" for contacting data and tools and viewing results through a single, consistent user interface.
Dhanyalakshmi, K H; Naika, Mahantesha B N; Sajeevan, R S; Mathew, Oommen K; Shafi, K Mohamed; Sowdhamini, Ramanathan; N Nataraja, Karaba
2016-01-01
The modern sequencing technologies are generating large volumes of information at the transcriptome and genome level. Translation of this information into a biological meaning is far behind the race due to which a significant portion of proteins discovered remain as proteins of unknown function (PUFs). Attempts to uncover the functional significance of PUFs are limited due to lack of easy and high throughput functional annotation tools. Here, we report an approach to assign putative functions to PUFs, identified in the transcriptome of mulberry, a perennial tree commonly cultivated as host of silkworm. We utilized the mulberry PUFs generated from leaf tissues exposed to drought stress at whole plant level. A sequence and structure based computational analysis predicted the probable function of the PUFs. For rapid and easy annotation of PUFs, we developed an automated pipeline by integrating diverse bioinformatics tools, designated as PUFs Annotation Server (PUFAS), which also provides a web service API (Application Programming Interface) for a large-scale analysis up to a genome. The expression analysis of three selected PUFs annotated by the pipeline revealed abiotic stress responsiveness of the genes, and hence their potential role in stress acclimation pathways. The automated pipeline developed here could be extended to assign functions to PUFs from any organism in general. PUFAS web server is available at http://caps.ncbs.res.in/pufas/ and the web service is accessible at http://capservices.ncbs.res.in/help/pufas.
Foster, Joseph M; Moreno, Pablo; Fabregat, Antonio; Hermjakob, Henning; Steinbeck, Christoph; Apweiler, Rolf; Wakelam, Michael J O; Vizcaíno, Juan Antonio
2013-01-01
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.
Quantum-chemical insights from deep tensor neural networks
Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R.; Tkatchenko, Alexandre
2017-01-01
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol−1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems. PMID:28067221
Ensembl 2002: accommodating comparative genomics.
Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E
2003-01-01
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Quantum-chemical insights from deep tensor neural networks.
Schütt, Kristof T; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R; Tkatchenko, Alexandre
2017-01-09
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol -1 ) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.
SAIL—a software system for sample and phenotype availability across biobanks and cohorts
Gostev, Mikhail; Fernandez-Banet, Julio; Rung, Johan; Dietrich, Joern; Prokopenko, Inga; Ripatti, Samuli; McCarthy, Mark I.; Brazma, Alvis; Krestyaninova, Maria
2011-01-01
Summary: The Sample avAILability system—SAIL—is a web based application for searching, browsing and annotating biological sample collections or biobank entries. By providing individual-level information on the availability of specific data types (phenotypes, genetic or genomic data) and samples within a collection, rather than the actual measurement data, resource integration can be facilitated. A flexible data structure enables the collection owners to provide descriptive information on their samples using existing or custom vocabularies. Users can query for the available samples by various parameters combining them via logical expressions. The system can be scaled to hold data from millions of samples with thousands of variables. Availability: SAIL is available under Aferro-GPL open source license: https://github.com/sail. Contact: gostev@ebi.ac.uk, support@simbioms.org Supplementary information: Supplementary data are available at Bioinformatics online and from http://www.simbioms.org. PMID:21169373
FRODOCK 2.0: fast protein-protein docking server.
Ramírez-Aportela, Erney; López-Blanco, José Ramón; Chacón, Pablo
2016-08-01
The prediction of protein-protein complexes from the structures of unbound components is a challenging and powerful strategy to decipher the mechanism of many essential biological processes. We present a user-friendly protein-protein docking server based on an improved version of FRODOCK that includes a complementary knowledge-based potential. The web interface provides a very effective tool to explore and select protein-protein models and interactively screen them against experimental distance constraints. The competitive success rates and efficiency achieved allow the retrieval of reliable potential protein-protein binding conformations that can be further refined with more computationally demanding strategies. The server is free and open to all users with no login requirement at http://frodock.chaconlab.org pablo@chaconlab.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Quantum-chemical insights from deep tensor neural networks
NASA Astrophysics Data System (ADS)
Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R.; Tkatchenko, Alexandre
2017-01-01
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol-1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.
modPDZpep: a web resource for structure based analysis of human PDZ-mediated interaction networks.
Sain, Neetu; Mohanty, Debasisa
2016-09-21
PDZ domains recognize short sequence stretches usually present in C-terminal of their interaction partners. Because of the involvement of PDZ domains in many important biological processes, several attempts have been made for developing bioinformatics tools for genome-wide identification of PDZ interaction networks. Currently available tools for prediction of interaction partners of PDZ domains utilize machine learning approach. Since, they have been trained using experimental substrate specificity data for specific PDZ families, their applicability is limited to PDZ families closely related to the training set. These tools also do not allow analysis of PDZ-peptide interaction interfaces. We have used a structure based approach to develop modPDZpep, a program to predict the interaction partners of human PDZ domains and analyze structural details of PDZ interaction interfaces. modPDZpep predicts interaction partners by using structural models of PDZ-peptide complexes and evaluating binding energy scores using residue based statistical pair potentials. Since, it does not require training using experimental data on peptide binding affinity, it can predict substrates for diverse PDZ families. Because of the use of simple scoring function for binding energy, it is also fast enough for genome scale structure based analysis of PDZ interaction networks. Benchmarking using artificial as well as real negative datasets indicates good predictive power with ROC-AUC values in the range of 0.7 to 0.9 for a large number of human PDZ domains. Another novel feature of modPDZpep is its ability to map novel PDZ mediated interactions in human protein-protein interaction networks, either by utilizing available experimental phage display data or by structure based predictions. In summary, we have developed modPDZpep, a web-server for structure based analysis of human PDZ domains. It is freely available at http://www.nii.ac.in/modPDZpep.html or http://202.54.226.235/modPDZpep.html . This article was reviewed by Michael Gromiha and Zoltán Gáspári.
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Semantic Web Applications and Tools for the Life Sciences: SWAT4LS 2010
2012-01-01
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data. PMID:22373274
Semantic Web applications and tools for the life sciences: SWAT4LS 2010.
Burger, Albert; Paschke, Adrian; Romano, Paolo; Marshall, M Scott; Splendiani, Andrea
2012-01-25
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data.
Simple re-instantiation of small databases using cloud computing.
Tan, Tin Wee; Xie, Chao; De Silva, Mark; Lim, Kuan Siong; Patro, C Pawan K; Lim, Shen Jean; Govindarajan, Kunde Ramamoorthy; Tong, Joo Chuan; Choo, Khar Heng; Ranganathan, Shoba; Khan, Asif M
2013-01-01
Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
Simple re-instantiation of small databases using cloud computing
2013-01-01
Background Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. Results We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Conclusions Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear. PMID:24564380
Honts, Jerry E.
2003-01-01
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489
Rapid and reliable protein structure determination via chemical shift threading.
Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S
2018-01-01
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .
Bioinformatic pipelines in Python with Leaf
2013-01-01
Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sundstrom, J.; Tash, B; Murakami, T
2009-01-01
The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos
2017-08-01
Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis
2017-01-01
Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616
Reverse-phase protein arrays (RPPA) represent a powerful functional proteomic approach to elucidate cancer-related molecular mechanisms and to develop novel cancer therapies. To facilitate community-based investigation of the large-scale protein expression data generated by this platform, we have developed a user-friendly, open-access bioinformatic resource, The Cancer Proteome Atlas (TCPA, http://tcpaportal.org), which contains two separate web applications.
Some of the most interesting CASP11 targets through the eyes of their authors.
Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten
2016-09-01
The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin
2015-04-01
This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.
Brooks, Mark A; Gewartowski, Kamil; Mitsiki, Eirini; Létoquart, Juliette; Pache, Roland A; Billier, Ysaline; Bertero, Michela; Corréa, Margot; Czarnocki-Cieciura, Mariusz; Dadlez, Michal; Henriot, Véronique; Lazar, Noureddine; Delbos, Lila; Lebert, Dorothée; Piwowarski, Jan; Rochaix, Pascal; Böttcher, Bettina; Serrano, Luis; Séraphin, Bertrand; van Tilbeurgh, Herman; Aloy, Patrick; Perrakis, Anastassis; Dziembowski, Andrzej
2010-09-08
For high-throughput structural studies of protein complexes of composition inferred from proteomics data, it is crucial that candidate complexes are selected accurately. Herein, we exemplify a procedure that combines a bioinformatics tool for complex selection with in vivo validation, to deliver structural results in a medium-throughout manner. We have selected a set of 20 yeast complexes, which were predicted to be feasible by either an automated bioinformatics algorithm, by manual inspection of primary data, or by literature searches. These complexes were validated with two straightforward and efficient biochemical assays, and heterologous expression technologies of complex components were then used to produce the complexes to assess their feasibility experimentally. Approximately one-half of the selected complexes were useful for structural studies, and we detail one particular success story. Our results underscore the importance of accurate target selection and validation in avoiding transient, unstable, or simply nonexistent complexes from the outset. Copyright © 2010 Elsevier Ltd. All rights reserved.
BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes
Jespersen, Martin Closter; Peters, Bjoern
2017-01-01
Abstract Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community. PMID:28472356
Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David
2010-10-01
Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
mpMoRFsDB: a database of molecular recognition features in membrane proteins.
Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J
2013-10-01
Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB
Delta: a new web-based 3D genome visualization and analysis platform.
Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua
2018-04-15
Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.
Giraldo-Calderón, Gloria I.; Emrich, Scott J.; MacCallum, Robert M.; Maslen, Gareth; Dialynas, Emmanuel; Topalis, Pantelis; Ho, Nicholas; Gesing, Sandra; Madey, Gregory; Collins, Frank H.; Lawson, Daniel
2015-01-01
VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/. PMID:25510499
Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z.; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark
2012-01-01
Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org. Contact: lukas.habegger@yale.edu or mark.gerstein@yale.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22743228
LIBRA-WA: a web application for ligand binding site detection and protein function recognition.
Toti, Daniele; Viet Hung, Le; Tortosa, Valentina; Brandi, Valentina; Polticelli, Fabio
2018-03-01
Recently, LIBRA, a tool for active/ligand binding site prediction, was described. LIBRA's effectiveness was comparable to similar state-of-the-art tools; however, its scoring scheme, output presentation, dependence on local resources and overall convenience were amenable to improvements. To solve these issues, LIBRA-WA, a web application based on an improved LIBRA engine, has been developed, featuring a novel scoring scheme consistently improving LIBRA's performance, and a refined algorithm that can identify binding sites hosted at the interface between different subunits. LIBRA-WA also sports additional functionalities like ligand clustering and a completely redesigned interface for an easier analysis of the output. Extensive tests on 373 apoprotein structures indicate that LIBRA-WA is able to identify the biologically relevant ligand/ligand binding site in 357 cases (∼96%), with the correct prediction ranking first in 349 cases (∼98% of the latter, ∼94% of the total). The earlier stand-alone tool has also been updated and dubbed LIBRA+, by integrating LIBRA-WA's improved engine for cross-compatibility purposes. LIBRA-WA and LIBRA+ are available at: http://www.computationalbiology.it/software.html. polticel@uniroma3.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
BμG@Sbase—a microbial gene expression and comparative genomic database
Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason
2012-01-01
The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792
BμG@Sbase--a microbial gene expression and comparative genomic database.
Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason
2012-01-01
The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.
MONTRA: An agile architecture for data publishing and discovery.
Bastião Silva, Luís; Trifan, Alina; Luís Oliveira, José
2018-07-01
Data catalogues are a common form of capturing and presenting information about a specific kind of entity (e.g. products, services, professionals, datasets, etc.). However, the construction of a web-based catalogue for a particular scenario normally implies the development of a specific and dedicated solution. In this paper, we present MONTRA, a rapid-application development framework designed to facilitate the integration and discovery of heterogeneous objects, which may be characterized by distinct data structures. MONTRA was developed following a plugin-based architecture to allow dynamic composition of services over represented datasets. The core of MONTRA's functionalities resides in a flexible data skeleton used to characterize data entities, and from which a fully-fledged web data catalogue is automatically generated, ensuring access control and data privacy. MONTRA is being successfully used by several European projects to collect and manage biomedical databases. In this paper, we describe three of these applications scenarios. This work was motivated by the plethora of geographically scattered biomedical repositories, and by the role they can play altogether for the understanding of diseases and of the real-world effectiveness of treatments. Using metadata to expose datasets' characteristics, MONTRA greatly simplifies the task of building data catalogues. The source code is publicly available at https://github.com/bioinformatics-ua/montra. Copyright © 2018 Elsevier B.V. All rights reserved.
web cellHTS2: a web-application for the analysis of high-throughput screening data.
Pelz, Oliver; Gilsdorf, Moritz; Boutros, Michael
2010-04-12
The analysis of high-throughput screening data sets is an expanding field in bioinformatics. High-throughput screens by RNAi generate large primary data sets which need to be analyzed and annotated to identify relevant phenotypic hits. Large-scale RNAi screens are frequently used to identify novel factors that influence a broad range of cellular processes, including signaling pathway activity, cell proliferation, and host cell infection. Here, we present a web-based application utility for the end-to-end analysis of large cell-based screening experiments by cellHTS2. The software guides the user through the configuration steps that are required for the analysis of single or multi-channel experiments. The web-application provides options for various standardization and normalization methods, annotation of data sets and a comprehensive HTML report of the screening data analysis, including a ranked hit list. Sessions can be saved and restored for later re-analysis. The web frontend for the cellHTS2 R/Bioconductor package interacts with it through an R-server implementation that enables highly parallel analysis of screening data sets. web cellHTS2 further provides a file import and configuration module for common file formats. The implemented web-application facilitates the analysis of high-throughput data sets and provides a user-friendly interface. web cellHTS2 is accessible online at http://web-cellHTS2.dkfz.de. A standalone version as a virtual appliance and source code for platforms supporting Java 1.5.0 can be downloaded from the web cellHTS2 page. web cellHTS2 is freely distributed under GPL.
Branch: an interactive, web-based tool for testing hypotheses and developing predictive models.
Gangavarapu, Karthik; Babji, Vyshakh; Meißner, Tobias; Su, Andrew I; Good, Benjamin M
2016-07-01
Branch is a web application that provides users with the ability to interact directly with large biomedical datasets. The interaction is mediated through a collaborative graphical user interface for building and evaluating decision trees. These trees can be used to compose and test sophisticated hypotheses and to develop predictive models. Decision trees are built and evaluated based on a library of imported datasets and can be stored in a collective area for sharing and re-use. Branch is hosted at http://biobranch.org/ and the open source code is available at http://bitbucket.org/sulab/biobranch/ asu@scripps.edu or bgood@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
MSAViewer: interactive JavaScript visualization of multiple sequence alignments.
Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E; Rost, Burkhard; Goldberg, Tatyana
2016-11-15
The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online. msa@bio.sh. © The Author 2016. Published by Oxford University Press.
MSAViewer: interactive JavaScript visualization of multiple sequence alignments
Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E.; Rost, Burkhard; Goldberg, Tatyana
2016-01-01
Summary: The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is ‘web ready’: written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. Availability and Implementation: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: msa@bio.sh PMID:27412096
GENIUS: web server to predict local gene networks and key genes for biological functions.
Puelma, Tomas; Araus, Viviana; Canales, Javier; Vidal, Elena A; Cabello, Juan M; Soto, Alvaro; Gutiérrez, Rodrigo A
2017-03-01
GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods. GENIUS currently supports eight model organisms and is freely available for public use at http://networks.bio.puc.cl/genius . genius.psbl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Ambrosini, Giovanna; Dreos, René; Kumar, Sunil; Bucher, Philipp
2016-11-18
ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .
Meyer, Michael J; Geske, Philip; Yu, Haiyuan
2016-05-15
Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Integrated web visualizations for protein-protein interaction databases.
Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas
2015-06-16
Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.
USDA-ARS?s Scientific Manuscript database
The Rhipicephalus microplus genome is large and complex in structure, making a genome sequence difficult to assemble and costly to resource the required bioinformatics. In light of this, a consortium of international collaborators was formed to pool resources to begin sequencing this genome. We have...
Exploring DNA Structure with Cn3D
ERIC Educational Resources Information Center
Porter, Sandra G.; Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca
2007-01-01
Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology…
Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C
2018-06-01
There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.
Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza
2015-11-01
Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV.
Field of genes: using Apache Kafka as a bioinformatic data repository.
Lawlor, Brendan; Lynch, Richard; Mac Aogáin, Micheál; Walsh, Paul
2018-04-01
Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI's RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data.
Perspective: Role of structure prediction in materials discovery and design
NASA Astrophysics Data System (ADS)
Needs, Richard J.; Pickard, Chris J.
2016-05-01
Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.
ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function
Moraes, Walas Jhony Lopes; Rodrigues, Thiago de Souza; Bartholomeu, Daniella Castanheira
2015-01-01
Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. PMID:25811026
TimeXNet Web: Identifying cellular response networks from diverse omics time-course data.
Tan, Phit Ling; López, Yosvany; Nakai, Kenta; Patil, Ashwini
2018-05-14
Condition-specific time-course omics profiles are frequently used to study cellular response to stimuli and identify associated signaling pathways. However, few online tools allow users to analyze multiple types of high-throughput time-course data. TimeXNet Web is a web server that extracts a time-dependent gene/protein response network from time-course transcriptomic, proteomic or phospho-proteomic data, and an input interaction network. It classifies the given genes/proteins into time-dependent groups based on the time of their highest activity and identifies the most probable paths connecting genes/proteins in consecutive groups. The response sub-network is enriched in activated genes/proteins and contains novel regulators that do not show any observable change in the input data. Users can view the resultant response network and analyze it for functional enrichment. TimeXNet Web supports the analysis of high-throughput data from multiple species by providing high quality, weighted protein-protein interaction networks for 12 model organisms. http://txnet.hgc.jp/. ashwini@hgc.jp. Supplementary data are available at Bioinformatics online.
miTRATA: a web-based tool for microRNA Truncation and Tailing Analysis.
Patel, Parth; Ramachandruni, S Deepthi; Kakrana, Atul; Nakano, Mayumi; Meyers, Blake C
2016-02-01
We describe miTRATA, the first web-based tool for microRNA Truncation and Tailing Analysis--the analysis of 3' modifications of microRNAs including the loss or gain of nucleotides relative to the canonical sequence. miTRATA is implemented in Python (version 3) and employs parallel processing modules to enhance its scalability when analyzing multiple small RNA (sRNA) sequencing datasets. It utilizes miRBase, currently version 21, as a source of known microRNAs for analysis. miTRATA notifies user(s) via email to download as well as visualize the results online. miTRATA's strengths lie in (i) its biologist-focused web interface, (ii) improved scalability via parallel processing and (iii) its uniqueness as a webtool to perform microRNA truncation and tailing analysis. miTRATA is developed in Python and PHP. It is available as a web-based application from https://wasabi.dbi.udel.edu/∼apps/ta/. meyers@dbi.udel.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry.
Wandy, Joe; Zhu, Yunfeng; van der Hooft, Justin J J; Daly, Rónán; Barrett, Michael P; Rogers, Simon
2017-09-14
We recently published MS2LDA, a method for the decomposition of sets of molecular fragment data derived from large metabolomics experiments. To make the method more widely available to the community, here we present ms2lda.org, a web application that allows users to upload their data, run MS2LDA analyses and explore the results through interactive visualisations. Ms2lda.org takes tandem mass spectrometry data in many standard formats and allows the user to infer the sets of fragment and neutral loss features that co-occur together (Mass2Motifs). As an alternative workflow, the user can also decompose a dataset onto predefined Mass2Motifs. This is accomplished through the web interface or programmatically from our web service. The website can be found at http://ms2lda.org , while the source code is available at https://github.com/sdrogers/ms2ldaviz under the MIT license. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
2011-02-16
were checked for the presence of heterotrophic bacteria by streak- ing a sample on ASW-R2A agar plates. DNA extraction and analysis of phylogenetic ...Bellerophon v. 3 (greengenes.lbl.gov) and Pintail (www.bioinformatics -toolkit.org/Web-Pintail/). Phylogenetic trees were constructed for SSU rRNA gene...CLUSTALW (44), and phylogenetic analyses were conducted in MEGA4 (42). The evolutionary history was inferred using the neighbor-joining method (39), and
Borges, Vítor; Pinheiro, Miguel; Pechirra, Pedro; Guiomar, Raquel; Gomes, João Paulo
2018-06-29
A new era of flu surveillance has already started based on the genetic characterization and exploration of influenza virus evolution at whole-genome scale. Although this has been prioritized by national and international health authorities, the demanded technological transition to whole-genome sequencing (WGS)-based flu surveillance has been particularly delayed by the lack of bioinformatics infrastructures and/or expertise to deal with primary next-generation sequencing (NGS) data. We developed and implemented INSaFLU ("INSide the FLU"), which is the first influenza-oriented bioinformatics free web-based suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line "genetic requests" for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants' annotation, alignments and phylogenetic trees). By handling NGS data collected from any amplicon-based schema, the implemented pipeline enables any laboratory to perform multi-step software intensive analyses in a user-friendly manner without previous advanced training in bioinformatics. INSaFLU gives access to user-restricted sample databases and projects management, being a transparent and flexible tool specifically designed to automatically update project outputs as more samples are uploaded. Data integration is thus cumulative and scalable, fitting the need for a continuous epidemiological surveillance during the flu epidemics. Multiple outputs are provided in nomenclature-stable and standardized formats that can be explored in situ or through multiple compatible downstream applications for fine-tuned data analysis. This platform additionally flags samples as "putative mixed infections" if the population admixture enrolls influenza viruses with clearly distinct genetic backgrounds, and enriches the traditional "consensus-based" influenza genetic characterization with relevant data on influenza sub-population diversification through a depth analysis of intra-patient minor variants. This dual approach is expected to strengthen our ability not only to detect the emergence of antigenic and drug resistance variants but also to decode alternative pathways of influenza evolution and to unveil intricate routes of transmission. In summary, INSaFLU supplies public health laboratories and influenza researchers with an open "one size fits all" framework, potentiating the operationalization of a harmonized multi-country WGS-based surveillance for influenza virus. INSaFLU can be accessed through https://insaflu.insa.pt .
Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten
2011-01-01
Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign. PMID:22073191
Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten
2011-01-01
Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens.
Spahn, Philipp N; Bath, Tyler; Weiss, Ryan J; Kim, Jihoon; Esko, Jeffrey D; Lewis, Nathan E; Harismendy, Olivier
2017-11-20
Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots. PinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions and test datasets.
2016 update on APBioNet's annual international conference on bioinformatics (InCoB).
Schönbach, Christian; Verma, Chandra; Wee, Lawrence Jin Kiat; Bond, Peter John; Ranganathan, Shoba
2016-12-22
InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.
Educational websites--Bioinformatics Tools II.
Lomberk, Gwen
2009-01-01
In this issue, the highlighted websites are a continuation of a series of educational websites; this one in particular from a couple of years ago, Bioinformatics Tools [Pancreatology 2005;5:314-315]. These include sites that are valuable resources for many research needs in genomics and proteomics. Bioinformatics has become a laboratory tool to map sequences to databases, develop models of molecular interactions, evaluate structural compatibilities, describe differences between normal and disease-associated DNA, identify conserved motifs within proteins, and chart extensive signaling networks, all in silico. Copyright 2008 S. Karger AG, Basel and IAP.
The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.
Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi
2005-04-15
Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.
[Application of bioinformatics in researches of industrial biocatalysis].
Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao
2004-05-01
Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.
ERIC Educational Resources Information Center
Honts, Jerry E.
2003-01-01
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…
Sequence alignment visualization in HTML5 without Java.
Gille, Christoph; Birgit, Weyand; Gille, Andreas
2014-01-01
Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.
BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.
Yao, Li; Wang, Heming; Song, Yuanyuan; Sui, Guangchao
2017-10-15
With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease. BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io. li_yao@outlook.com or gcsui@nefu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Tools and data services registry: a community effort to document bioinformatics resources
Ison, Jon; Rapacki, Kristoffer; Ménager, Hervé; Kalaš, Matúš; Rydza, Emil; Chmura, Piotr; Anthon, Christian; Beard, Niall; Berka, Karel; Bolser, Dan; Booth, Tim; Bretaudeau, Anthony; Brezovsky, Jan; Casadio, Rita; Cesareni, Gianni; Coppens, Frederik; Cornell, Michael; Cuccuru, Gianmauro; Davidsen, Kristian; Vedova, Gianluca Della; Dogan, Tunca; Doppelt-Azeroual, Olivia; Emery, Laura; Gasteiger, Elisabeth; Gatter, Thomas; Goldberg, Tatyana; Grosjean, Marie; Grüning, Björn; Helmer-Citterich, Manuela; Ienasescu, Hans; Ioannidis, Vassilios; Jespersen, Martin Closter; Jimenez, Rafael; Juty, Nick; Juvan, Peter; Koch, Maximilian; Laibe, Camille; Li, Jing-Woei; Licata, Luana; Mareuil, Fabien; Mičetić, Ivan; Friborg, Rune Møllegaard; Moretti, Sebastien; Morris, Chris; Möller, Steffen; Nenadic, Aleksandra; Peterson, Hedi; Profiti, Giuseppe; Rice, Peter; Romano, Paolo; Roncaglia, Paola; Saidi, Rabie; Schafferhans, Andrea; Schwämmle, Veit; Smith, Callum; Sperotto, Maria Maddalena; Stockinger, Heinz; Vařeková, Radka Svobodová; Tosatto, Silvio C.E.; de la Torre, Victor; Uva, Paolo; Via, Allegra; Yachdav, Guy; Zambelli, Federico; Vriend, Gert; Rost, Burkhard; Parkinson, Helen; Løngreen, Peter; Brunak, Søren
2016-01-01
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools. PMID:26538599
ORBIT: an integrated environment for user-customized bioinformatics tools.
Bellgard, M I; Hiew, H L; Hunter, A; Wiebrands, M
1999-10-01
There are a large number of computational programs freely available to bioinformaticians via a client/server, web-based environment. However, the client interface to these tools (typically an html form page) cannot be customized from the client side as it is created by the service provider. The form page is usually generic enough to cater for a wide range of users. However, this implies that a user cannot set as 'default' advanced program parameters on the form or even customize the interface to his/her specific requirements or preferences. Currently, there is a lack of end-user interface environments that can be modified by the user when accessing computer programs available on a remote server running on an intranet or over the Internet. We have implemented a client/server system called ORBIT (Online Researcher's Bioinformatics Interface Tools) where individual clients can have interfaces created and customized to command-line-driven, server-side programs. Thus, Internet-based interfaces can be tailored to a user's specific bioinformatic needs. As interfaces are created on the client machine independent of the server, there can be different interfaces to the same server-side program to cater for different parameter settings. The interface customization is relatively quick (between 10 and 60 min) and all client interfaces are integrated into a single modular environment which will run on any computer platform supporting Java. The system has been developed to allow for a number of future enhancements and features. ORBIT represents an important advance in the way researchers gain access to bioinformatics tools on the Internet.
Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.
Heo, Go Eun; Kang, Keun Young; Song, Min; Lee, Jeong-Hoon
2017-05-31
Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.
New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.
Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta
2018-04-15
Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.
Kirschner, Andreas; Frishman, Dmitrij
2008-10-01
Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. http://webclu.bio.wzw.tum.de/predator-web/.
The HMMER Web Server for Protein Sequence Similarity Search.
Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D
2017-12-08
Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Venselaar, Hanka; Te Beek, Tim A H; Kuipers, Remko K P; Hekkelman, Maarten L; Vriend, Gert
2010-11-08
Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.
Curatr: a web application for creating, curating and sharing a mass spectral library.
Palmer, Andrew; Phapale, Prasad; Fay, Dominik; Alexandrov, Theodore
2018-04-15
We have developed a web application curatr for the rapid generation of high quality mass spectral fragmentation libraries from liquid-chromatography mass spectrometry datasets. Curatr handles datasets from single or multiplexed standards and extracts chromatographic profiles and potential fragmentation spectra for multiple adducts. An intuitive interface helps users to select high quality spectra that are stored along with searchable molecular information, the providence of each standard and experimental metadata. Curatr supports exports to several standard formats for use with third party software or submission to repositories. We demonstrate the use of curatr to generate the EMBL Metabolomics Core Facility spectral library http://curatr.mcf.embl.de. Source code and example data are at http://github.com/alexandrovteam/curatr/. palmer@embl.de. Supplementary data are available at Bioinformatics online.
Field of genes: using Apache Kafka as a bioinformatic data repository
Lynch, Richard; Walsh, Paul
2018-01-01
Abstract Background Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI’s) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI’s RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. Results The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Conclusions Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data. PMID:29635394
Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza
2015-01-01
Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379
Welcome to health information science and systems.
Zhang, Yanchun
2013-01-01
Health Information Science and Systems is an exciting, new, multidisciplinary journal that aims to use technologies in computer science to assist in disease diagnoses, treatment, prediction and monitoring through the modeling, design, development, visualization, integration and management of health related information. These computer-science technologies include such as information systems, web technologies, data mining, image processing, user interaction and interface, sensors and wireless networking and are applicable to a wide range of health related information including medical data, biomedical data, bioinformatics data, public health data.
KMgene: a unified R package for gene-based association analysis for complex traits.
Yan, Qi; Fang, Zhou; Chen, Wei; Stegle, Oliver
2018-02-09
In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. http://cran.r-project.org/web/packages/KMgene. qi.yan@chp.edu or wei.chen@chp.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
SPS' Digest: the Swiss Proteomics Society selection of proteomics articles.
Hoogland, Christine; Lion, Niels; Palagi, Patricia M; Sanchez, Jean-Charles; Tissot, Jean-Daniel
2005-08-01
Despite the consolidation of the specialized proteomics literature around a few established journals, such as Proteomics, Molecular and Cellular Proteomics, and the Journal of Proteome Research, a lot of information is still spread in many different publications from different fields, such as analytical sciences, MS, bioinformatics, etc. The purpose of SPS' Digest is to gather a selection of proteomics articles, to categorize them, and to make the list available on a periodic basis through a web page and email alerts.
A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP
ERIC Educational Resources Information Center
Medin, Carey L.; Nolin, Katie L.
2011-01-01
Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…
ERIC Educational Resources Information Center
Grunwald, Sandra K.; Krueger, Katherine J.
2008-01-01
Laboratory exercises, which utilize alkaline phosphatase as a model enzyme, have been developed and used extensively in undergraduate biochemistry courses to illustrate enzyme steady-state kinetics. A bioinformatics laboratory exercise for the biochemistry laboratory, which complements the traditional alkaline phosphatase kinetics exercise, was…
MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.
Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G
2012-12-07
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.
Pharmacogenomics and its potential impact on drug and formulation development.
Regnstrom, Karin; Burgess, Diane J
2005-01-01
Recent advances in genomic research have provided the basis for new insights into the importance of genetic and genomic markers during the different stages of drug development. A new field of research, pharmacogenomics, which studies the relationship between drug effects and the genome, has emerged. Structural pharmacogenomics maps the complete DNA sequences of whole genomes (genotypes) including individual variations, and functional pharmacogenomics assesses the expression levels of thousands of genes in one single experiment. Together, these two areas of pharmacogenomics have generated massive databases, which have become a challenge for the research field of informatics and have fostered a new branch of research, bioinformatics. If skillfully used, the databases generated by pharmacogenomics together with data mining on the Web promise to improve the drug development process in a variety of areas: identification of drug targets, evaluation of toxicity, classification of diseases, evaluation of formulations, assessment of drug response and treatment, post-marketing applications, and development of personalized medicines.
MINS2: revisiting the molecular code for transmembrane-helix recognition by the Sec61 translocon.
Park, Yungki; Helms, Volkhard
2008-08-15
To be fully functional, membrane proteins should not only fold, but also get inserted into the membrane, which is mediated by the Sec61 translocon. Recent experimental studies have attempted to elucidate how the Sec61 translocon accomplishes this delicate task by measuring the translocon-mediated membrane insertion free energies of 357 systematically designed peptides. On the basis of this data set, we have developed MINS2, a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark analysis of MINS2 shows that MINS2 signi.cantly outperforms previously proposed methods. Importantly, the application of MINS2 to known membrane protein structures shows that a better prediction of membrane insertion free energies does not lead to a better prediction of transmembrane segments of polytopic membrane proteins. A web server for MINS2 is publicly available at http://service.bioinformatik.uni-saarland.de/mins. Supplementary data are available at Bioinformatics online.
MyDas, an Extensible Java DAS Server
Jimenez, Rafael C.; Quinn, Antony F.; Jenkinson, Andrew M.; Mulder, Nicola; Martin, Maria; Hunter, Sarah; Hermjakob, Henning
2012-01-01
A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users. We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details. PMID:23028496
Finding similar nucleotide sequences using network BLAST searches.
Ladunga, Istvan
2009-06-01
The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.
Lekschas, Fritz; Gehlenborg, Nils
2018-04-01
The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
MyDas, an extensible Java DAS server.
Salazar, Gustavo A; García, Leyla J; Jones, Philip; Jimenez, Rafael C; Quinn, Antony F; Jenkinson, Andrew M; Mulder, Nicola; Martin, Maria; Hunter, Sarah; Hermjakob, Henning
2012-01-01
A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.
Saunders, Brian; Lyon, Stephen; Day, Matthew; Riley, Brenda; Chenette, Emily; Subramaniam, Shankar
2008-01-01
The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community. PMID:17965093
Saunders, Brian; Lyon, Stephen; Day, Matthew; Riley, Brenda; Chenette, Emily; Subramaniam, Shankar; Vadivelu, Ilango
2008-01-01
The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community.
VoroMQA: Assessment of protein structure quality using interatomic contact areas.
Olechnovič, Kliment; Venclovas, Česlovas
2017-06-01
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.
Jespersen, Martin Closter; Peters, Bjoern; Nielsen, Morten; Marcatili, Paolo
2017-07-03
Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
DIBS: a repository of disordered binding sites mediating interactions with ordered proteins.
Schad, Eva; Fichó, Erzsébet; Pancsa, Rita; Simon, István; Dosztányi, Zsuzsanna; Mészáros, Bálint
2018-02-01
Intrinsically Disordered Proteins (IDPs) mediate crucial protein-protein interactions, most notably in signaling and regulation. As their importance is increasingly recognized, the detailed analyses of specific IDP interactions opened up new opportunities for therapeutic targeting. Yet, large scale information about IDP-mediated interactions in structural and functional details are lacking, hindering the understanding of the mechanisms underlying this distinct binding mode. Here, we present DIBS, the first comprehensive, curated collection of complexes between IDPs and ordered proteins. DIBS not only describes by far the highest number of cases, it also provides the dissociation constants of their interactions, as well as the description of potential post-translational modifications modulating the binding strength and linear motifs involved in the binding. Together with the wide range of structural and functional annotations, DIBS will provide the cornerstone for structural and functional studies of IDP complexes. DIBS is freely accessible at http://dibs.enzim.ttk.mta.hu/. The DIBS application is hosted by Apache web server and was implemented in PHP. To enrich querying features and to enhance backend performance a MySQL database was also created. dosztanyi@caesar.elte.hu or bmeszaros@caesar.elte.hu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
2011-01-01
Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.
Cieślik, Marcin; Mura, Cameron
2011-02-25
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Bioinformatics in the Netherlands: the value of a nationwide community.
van Gelder, Celia W G; Hooft, Rob W W; van Rijswijk, Merlijn N; van den Berg, Linda; Kok, Ruben G; Reinders, Marcel; Mons, Barend; Heringa, Jaap
2017-09-15
This review provides a historical overview of the inception and development of bioinformatics research in the Netherlands. Rooted in theoretical biology by foundational figures such as Paulien Hogeweg (at Utrecht University since the 1970s), the developments leading to organizational structures supporting a relatively large Dutch bioinformatics community will be reviewed. We will show that the most valuable resource that we have built over these years is the close-knit national expert community that is well engaged in basic and translational life science research programmes. The Dutch bioinformatics community is accustomed to facing the ever-changing landscape of data challenges and working towards solutions together. In addition, this community is the stable factor on the road towards sustainability, especially in times where existing funding models are challenged and change rapidly. © The Author 2017. Published by Oxford University Press.
Accessing the SEED genome databases via Web services API: tools for programmers.
Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A
2010-06-14
The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.
Executing SADI services in Galaxy.
Aranguren, Mikel Egaña; González, Alejandro Rodríguez; Wilkinson, Mark D
2014-01-01
In recent years Galaxy has become a popular workflow management system in bioinformatics, due to its ease of installation, use and extension. The availability of Semantic Web-oriented tools in Galaxy, however, is limited. This is also the case for Semantic Web Services such as those provided by the SADI project, i.e. services that consume and produce RDF. Here we present SADI-Galaxy, a tool generator that deploys selected SADI Services as typical Galaxy tools. SADI-Galaxy is a Galaxy tool generator: through SADI-Galaxy, any SADI-compliant service becomes a Galaxy tool that can participate in other out-standing features of Galaxy such as data storage, history, workflow creation, and publication. Galaxy can also be used to execute and combine SADI services as it does with other Galaxy tools. Finally, we have semi-automated the packing and unpacking of data into RDF such that other Galaxy tools can easily be combined with SADI services, plugging the rich SADI Semantic Web Service environment into the popular Galaxy ecosystem. SADI-Galaxy bridges the gap between Galaxy, an easy to use but "static" workflow system with a wide user-base, and SADI, a sophisticated, semantic, discovery-based framework for Web Services, thus benefiting both user communities.
Sputnik: ad hoc distributed computation.
Völkel, Gunnar; Lausser, Ludwig; Schmid, Florian; Kraus, Johann M; Kestler, Hans A
2015-04-15
In bioinformatic applications, computationally demanding algorithms are often parallelized to speed up computation. Nevertheless, setting up computational environments for distributed computation is often tedious. Aim of this project were the lightweight ad hoc set up and fault-tolerant computation requiring only a Java runtime, no administrator rights, while utilizing all CPU cores most effectively. The Sputnik framework provides ad hoc distributed computation on the Java Virtual Machine which uses all supplied CPU cores fully. It provides a graphical user interface for deployment setup and a web user interface displaying the current status of current computation jobs. Neither a permanent setup nor administrator privileges are required. We demonstrate the utility of our approach on feature selection of microarray data. The Sputnik framework is available on Github http://github.com/sysbio-bioinf/sputnik under the Eclipse Public License. hkestler@fli-leibniz.de or hans.kestler@uni-ulm.de Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Liu, Bin; Long, Ren; Chou, Kuo-Chen
2016-08-15
Regulatory DNA elements are associated with DNase I hypersensitive sites (DHSs). Accordingly, identification of DHSs will provide useful insights for in-depth investigation into the function of noncoding genomic regions. In this study, using the strategy of ensemble learning framework, we proposed a new predictor called iDHS-EL for identifying the location of DHS in human genome. It was formed by fusing three individual Random Forest (RF) classifiers into an ensemble predictor. The three RF operators were respectively based on the three special modes of the general pseudo nucleotide composition (PseKNC): (i) kmer, (ii) reverse complement kmer and (iii) pseudo dinucleotide composition. It has been demonstrated that the new predictor remarkably outperforms the relevant state-of-the-art methods in both accuracy and stability. For the convenience of most experimental scientists, a web server for iDHS-EL is established at http://bioinformatics.hitsz.edu.cn/iDHS-EL, which is the first web-server predictor ever established for identifying DHSs, and by which users can easily get their desired results without the need to go through the mathematical details. We anticipate that IDHS-EL: will become a very useful high throughput tool for genome analysis. bliu@gordonlifescience.org or bliu@insun.hit.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Structural analysis of linear and conformational epitopes of allergens
Ivanciuc, Ovidiu; Schein, Catherine H.; Garcia, Tzintzuni; Oezguen, Numan; Negi, Surendra S.; Braun, Werner
2009-01-01
In many countries regulatory agencies have adopted safety guidelines, based on bioinformatics rules from the WHO/FAO and EFSA recommendations, to prevent potentially allergenic novel foods or agricultural products from reaching consumers. We created the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to combine data that had previously been available only as flat files on Web pages or in the literature. SDAP was designed to be user friendly, to be of maximum use to regulatory agencies, clinicians, as well as to scientists interested in assessing the potential allergenic risk of a protein. We developed methods, unique to SDAP, to compare the physicochemical properties of discrete areas of allergenic proteins to known IgE epitopes. We developed a new similarity measure, the property distance (PD) value that can be used to detect related segments in allergens with clinical observed crossreactivity. We have now expanded this work to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to known IgE epitopes. In complementary work we show how sequence motifs characteristic of allergenic proteins in protein families can be used as fingerprints for allergenicity. PMID:19121639
Lim, Chun Shen; Brown, Chris M
2017-01-01
Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.
Lim, Chun Shen; Brown, Chris M.
2018-01-01
Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis
2014-11-15
The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
iHOPerator: user-scripting a personalized bioinformatics Web, starting with the iHOP website
Good, Benjamin M; Kawas, Edward A; Kuo, Byron Yu-Lin; Wilkinson, Mark D
2006-01-01
Background User-scripts are programs stored in Web browsers that can manipulate the content of websites prior to display in the browser. They provide a novel mechanism by which users can conveniently gain increased control over the content and the display of the information presented to them on the Web. As the Web is the primary medium by which scientists retrieve biological information, any improvements in the mechanisms that govern the utility or accessibility of this information may have profound effects. GreaseMonkey is a Mozilla Firefox extension that facilitates the development and deployment of user-scripts for the Firefox web-browser. We utilize this to enhance the content and the presentation of the iHOP (information Hyperlinked Over Proteins) website. Results The iHOPerator is a GreaseMonkey user-script that augments the gene-centred pages on iHOP by providing a compact, configurable visualization of the defining information for each gene and by enabling additional data, such as biochemical pathway diagrams, to be collected automatically from third party resources and displayed in the same browsing context. Conclusion This open-source script provides an extension to the iHOP website, demonstrating how user-scripts can personalize and enhance the Web browsing experience in a relevant biological setting. The novel, user-driven controls over the content and the display of Web resources made possible by user-scripts, such as the iHOPerator, herald the beginning of a transition from a resource-centric to a user-centric Web experience. We believe that this transition is a necessary step in the development of Web technology that will eventually result in profound improvements in the way life scientists interact with information. PMID:17173692
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
EVpedia: A community web resource for prokaryotic and eukaryotic extracellular vesicles research.
Kim, Dae-Kyum; Lee, Jaewook; Simpson, Richard J; Lötvall, Jan; Gho, Yong Song
2015-04-01
For cell-to-cell communication, all living cells including archaea, bacteria, and eukaryotes secrete nano-sized membrane vesicles into the extracellular space. These extracellular vesicles harbor specific subsets of proteins, mRNAs, miRNAs, lipids, and metabolites that represent their cellular status. These vesicle-specific cargos are considered as novel diagnostic biomarkers as well as therapeutic targets. With the advancement in high-throughput technologies on multiomics studies and improvements in bioinformatics approaches, a huge number of vesicular proteins, mRNAs, miRNAs, lipids, and metabolites have been identified, and our understanding of these complex extracellular organelles has considerably increased during these past years. In this review, we highlight EVpedia (http://evpedia.info), a community web portal for systematic analyses of prokaryotic and eukaryotic extracellular vesicles research. Copyright © 2015 Elsevier Ltd. All rights reserved.
Taking Open Innovation to the Molecular Level - Strengths and Limitations.
Zdrazil, Barbara; Blomberg, Niklas; Ecker, Gerhard F
2012-08-01
The ever-growing availability of large-scale open data and its maturation is having a significant impact on industrial drug-discovery, as well as on academic and non-profit research. As industry is changing to an 'open innovation' business concept, precompetitive initiatives and strong public-private partnerships including academic research cooperation partners are gaining more and more importance. Now, the bioinformatics and cheminformatics communities are seeking for web tools which allow the integration of this large volume of life science datasets available in the public domain. Such a data exploitation tool would ideally be able to answer complex biological questions by formulating only one search query. In this short review/perspective, we outline the use of semantic web approaches for data and knowledge integration. Further, we discuss strengths and current limitations of public available data retrieval tools and integrated platforms.
WHIDE—a web tool for visual data mining colocation patterns in multivariate bioimages
Kölling, Jan; Langenkämper, Daniel; Abouna, Sylvie; Khan, Michael; Nattkemper, Tim W.
2012-01-01
Motivation: Bioimaging techniques rapidly develop toward higher resolution and dimension. The increase in dimension is achieved by different techniques such as multitag fluorescence imaging, Matrix Assisted Laser Desorption / Ionization (MALDI) imaging or Raman imaging, which record for each pixel an N-dimensional intensity array, representing local abundances of molecules, residues or interaction patterns. The analysis of such multivariate bioimages (MBIs) calls for new approaches to support users in the analysis of both feature domains: space (i.e. sample morphology) and molecular colocation or interaction. In this article, we present our approach WHIDE (Web-based Hyperbolic Image Data Explorer) that combines principles from computational learning, dimension reduction and visualization in a free web application. Results: We applied WHIDE to a set of MBI recorded using the multitag fluorescence imaging Toponome Imaging System. The MBI show field of view in tissue sections from a colon cancer study and we compare tissue from normal/healthy colon with tissue classified as tumor. Our results show, that WHIDE efficiently reduces the complexity of the data by mapping each of the pixels to a cluster, referred to as Molecular Co-Expression Phenotypes and provides a structural basis for a sophisticated multimodal visualization, which combines topology preserving pseudocoloring with information visualization. The wide range of WHIDE's applicability is demonstrated with examples from toponome imaging, high content screens and MALDI imaging (shown in the Supplementary Material). Availability and implementation: The WHIDE tool can be accessed via the BioIMAX website http://ani.cebitec.uni-bielefeld.de/BioIMAX/; Login: whidetestuser; Password: whidetest. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: tim.nattkemper@uni-bielefeld.de PMID:22390938
iEnhancer-EL: Identifying enhancers and their strength with ensemble learning approach.
Liu, Bin; Li, Kai; Huang, De-Shuang; Chou, Kuo-Chen
2018-06-07
Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non-enhancers only. Recently, a two-layer predictor called "iEnhancer-2L" was developed that can be used to predict the enhancer's strength as well. However, its prediction quality needs further improvement to enhance the practical application value. A new predictor called "iEnhancer-EL" was proposed that contains two layer predictors: the first one (for identifying enhancers) is formed by fusing an array of six key individual classifiers, and the second one (for their strength) formed by fusing an array of ten key individual classifiers. All these key classifiers were selected from 171 elementary classifiers formed by SVM (Support Vector Machine) based on kmer, subsequence profile, and PseKNC (Pseudo K-tuple Nucleotide Composition), respectively. Rigorous cross-validations have indicated that the proposed predictor is remarkably superior to the existing state-of-the-art one in this area. A web server for the iEnhancer-EL has been established at http://bioinformatics.hitsz.edu.cn/iEnhancer-EL/, by which users can easily get their desired results without the need to go through the mathematical details. bliu@hit.edu.cn, dshuang@tongji.edu.cn or kcchou@gordonlifescience.org. Supplementary data are available at Bioinformatics online.
Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena
2015-06-01
Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.
Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Eisenhaber, Frank
2014-06-01
Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.
RNAiFold2T: Constraint Programming design of thermo-IRES switches.
Garcia-Martin, Juan Antonio; Dotu, Ivan; Fernandez-Chamorro, Javier; Lozano, Gloria; Ramajo, Jorge; Martinez-Salas, Encarnacion; Clote, Peter
2016-06-15
RNA thermometers (RNATs) are cis-regulatory elements that change secondary structure upon temperature shift. Often involved in the regulation of heat shock, cold shock and virulence genes, RNATs constitute an interesting potential resource in synthetic biology, where engineered RNATs could prove to be useful tools in biosensors and conditional gene regulation. Solving the 2-temperature inverse folding problem is critical for RNAT engineering. Here we introduce RNAiFold2T, the first Constraint Programming (CP) and Large Neighborhood Search (LNS) algorithms to solve this problem. Benchmarking tests of RNAiFold2T against existent programs (adaptive walk and genetic algorithm) inverse folding show that our software generates two orders of magnitude more solutions, thus allowing ample exploration of the space of solutions. Subsequently, solutions can be prioritized by computing various measures, including probability of target structure in the ensemble, melting temperature, etc. Using this strategy, we rationally designed two thermosensor internal ribosome entry site (thermo-IRES) elements, whose normalized cap-independent translation efficiency is approximately 50% greater at 42 °C than 30 °C, when tested in reticulocyte lysates. Translation efficiency is lower than that of the wild-type IRES element, which on the other hand is fully resistant to temperature shift-up. This appears to be the first purely computational design of functional RNA thermoswitches, and certainly the first purely computational design of functional thermo-IRES elements. RNAiFold2T is publicly available as part of the new release RNAiFold3.0 at https://github.com/clotelab/RNAiFold and http://bioinformatics.bc.edu/clotelab/RNAiFold, which latter has a web server as well. The software is written in C ++ and uses OR-Tools CP search engine. clote@bc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Unraveling transcriptional control and cis-regulatory codes using the software suite GeneACT
Cheung, Tom Hiu; Kwan, Yin Lam; Hamady, Micah; Liu, Xuedong
2006-01-01
Deciphering gene regulatory networks requires the systematic identification of functional cis-acting regulatory elements. We present a suite of web-based bioinformatics tools, called GeneACT , that can rapidly detect evolutionarily conserved transcription factor binding sites or microRNA target sites that are either unique or over-represented in differentially expressed genes from DNA microarray data. GeneACT provides graphic visualization and extraction of common regulatory sequence elements in the promoters and 3'-untranslated regions that are conserved across multiple mammalian species. PMID:17064417
Vu, Michael M. K.; Jameson, Nora E.; Masuda, Stuart J.; Lin, Dana; Larralde-Ridaura, Rosa; Lupták, Andrej
2012-01-01
SUMMARY Aptamers are structured macromolecules in vitro evolved to bind molecular targets, whereas in nature they form the ligand-binding domains of riboswitches. Adenosine aptamers of a single structural family were isolated several times from random pools but they have not been identified in genomic sequences. We used two unbiased methods, structure-based bioinformatics and human genome-based in vitro selection, to identify aptamers that form the same adenosine-binding structure in a bacterium, and several vertebrates, including humans. Two of the human aptamers map to introns of RAB3C and FGD3 genes. The RAB3C aptamer binds ATP with dissociation constants about ten times lower than physiological ATP concentration, while the minimal FGD3 aptamer binds ATP only co-transcriptionally. PMID:23102219
Patel, Ashokkumar A; Gilbertson, John R; Showe, Louise C; London, Jack W; Ross, Eric; Ochs, Michael F; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V; Dhir, Rajiv; Beck, J Robert; Liebman, Michael; Garcia, Fernando U; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B; Becich, Michael J
2007-06-08
The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium's intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions' IRB/HIPAA standards. Currently, this "virtual biorepository" has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium's web site. The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.
Patel, Ashokkumar A.; Gilbertson, John R.; Showe, Louise C.; London, Jack W.; Ross, Eric; Ochs, Michael F.; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V.; Dhir, Rajiv; Beck, J. Robert; Liebman, Michael; Garcia, Fernando U.; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B.; Becich, Michael J.
2007-01-01
Background: The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. Methods: The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards. Results: Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site. Conclusions: The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy. PMID:19455246
QMachine: commodity supercomputing in web browsers.
Wilkinson, Sean R; Almeida, Jonas S
2014-06-09
Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
Workflow and web application for annotating NCBI BioProject transcriptome data
Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David
2017-01-01
Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.
Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G
2013-07-01
MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows
Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.
2013-01-01
MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784
NASA Astrophysics Data System (ADS)
Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.
2017-11-01
A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.
PanWeb: A web interface for pan-genomic analysis.
Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J
2017-01-01
With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.
Oasis: online analysis of small RNA deep sequencing data.
Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan
2015-07-01
Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
deepTools: a flexible platform for exploring deep-sequencing data.
Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas
2014-07-01
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.
Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun
2012-09-15
To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.
Tutorial videos of bioinformatics resources: online distribution trial in Japan named TogoTV.
Kawano, Shin; Ono, Hiromasa; Takagi, Toshihisa; Bono, Hidemasa
2012-03-01
In recent years, biological web resources such as databases and tools have become more complex because of the enormous amounts of data generated in the field of life sciences. Traditional methods of distributing tutorials include publishing textbooks and posting web documents, but these static contents cannot adequately describe recent dynamic web services. Due to improvements in computer technology, it is now possible to create dynamic content such as video with minimal effort and low cost on most modern computers. The ease of creating and distributing video tutorials instead of static content improves accessibility for researchers, annotators and curators. This article focuses on online video repositories for educational and tutorial videos provided by resource developers and users. It also describes a project in Japan named TogoTV (http://togotv.dbcls.jp/en/) and discusses the production and distribution of high-quality tutorial videos, which would be useful to viewer, with examples. This article intends to stimulate and encourage researchers who develop and use databases and tools to distribute how-to videos as a tool to enhance product usability.
Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi
2014-03-01
Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.
Tutorial videos of bioinformatics resources: online distribution trial in Japan named TogoTV
Kawano, Shin; Ono, Hiromasa; Takagi, Toshihisa
2012-01-01
In recent years, biological web resources such as databases and tools have become more complex because of the enormous amounts of data generated in the field of life sciences. Traditional methods of distributing tutorials include publishing textbooks and posting web documents, but these static contents cannot adequately describe recent dynamic web services. Due to improvements in computer technology, it is now possible to create dynamic content such as video with minimal effort and low cost on most modern computers. The ease of creating and distributing video tutorials instead of static content improves accessibility for researchers, annotators and curators. This article focuses on online video repositories for educational and tutorial videos provided by resource developers and users. It also describes a project in Japan named TogoTV (http://togotv.dbcls.jp/en/) and discusses the production and distribution of high-quality tutorial videos, which would be useful to viewer, with examples. This article intends to stimulate and encourage researchers who develop and use databases and tools to distribute how-to videos as a tool to enhance product usability. PMID:21803786
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.
2010-04-01
Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.
Persistence and availability of Web services in computational biology.
Schultheiss, Sebastian J; Münch, Marc-Christian; Andreeva, Gergana D; Rätsch, Gunnar
2011-01-01
We have conducted a study on the long-term availability of bioinformatics Web services: an observation of 927 Web services published in the annual Nucleic Acids Research Web Server Issues between 2003 and 2009. We found that 72% of Web sites are still available at the published addresses, only 9% of services are completely unavailable. Older addresses often redirect to new pages. We checked the functionality of all available services: for 33%, we could not test functionality because there was no example data or a related problem; 13% were truly no longer working as expected; we could positively confirm functionality only for 45% of all services. Additionally, we conducted a survey among 872 Web Server Issue corresponding authors; 274 replied. 78% of all respondents indicate their services have been developed solely by students and researchers without a permanent position. Consequently, these services are in danger of falling into disrepair after the original developers move to another institution, and indeed, for 24% of services, there is no plan for maintenance, according to the respondents. We introduce a Web service quality scoring system that correlates with the number of citations: services with a high score are cited 1.8 times more often than low-scoring services. We have identified key characteristics that are predictive of a service's survival, providing reviewers, editors, and Web service developers with the means to assess or improve Web services. A Web service conforming to these criteria receives more citations and provides more reliable service for its users. The most effective way of ensuring continued access to a service is a persistent Web address, offered either by the publishing journal, or created on the authors' own initiative, for example at http://bioweb.me. The community would benefit the most from a policy requiring any source code needed to reproduce results to be deposited in a public repository.
Persistence and Availability of Web Services in Computational Biology
Schultheiss, Sebastian J.; Münch, Marc-Christian; Andreeva, Gergana D.; Rätsch, Gunnar
2011-01-01
We have conducted a study on the long-term availability of bioinformatics Web services: an observation of 927 Web services published in the annual Nucleic Acids Research Web Server Issues between 2003 and 2009. We found that 72% of Web sites are still available at the published addresses, only 9% of services are completely unavailable. Older addresses often redirect to new pages. We checked the functionality of all available services: for 33%, we could not test functionality because there was no example data or a related problem; 13% were truly no longer working as expected; we could positively confirm functionality only for 45% of all services. Additionally, we conducted a survey among 872 Web Server Issue corresponding authors; 274 replied. 78% of all respondents indicate their services have been developed solely by students and researchers without a permanent position. Consequently, these services are in danger of falling into disrepair after the original developers move to another institution, and indeed, for 24% of services, there is no plan for maintenance, according to the respondents. We introduce a Web service quality scoring system that correlates with the number of citations: services with a high score are cited 1.8 times more often than low-scoring services. We have identified key characteristics that are predictive of a service's survival, providing reviewers, editors, and Web service developers with the means to assess or improve Web services. A Web service conforming to these criteria receives more citations and provides more reliable service for its users. The most effective way of ensuring continued access to a service is a persistent Web address, offered either by the publishing journal, or created on the authors' own initiative, for example at http://bioweb.me. The community would benefit the most from a policy requiring any source code needed to reproduce results to be deposited in a public repository. PMID:21966383
Opal web services for biomedical applications.
Ren, Jingyuan; Williams, Nadya; Clementi, Luca; Krishnan, Sriram; Li, Wilfred W
2010-07-01
Biomedical applications have become increasingly complex, and they often require large-scale high-performance computing resources with a large number of processors and memory. The complexity of application deployment and the advances in cluster, grid and cloud computing require new modes of support for biomedical research. Scientific Software as a Service (sSaaS) enables scalable and transparent access to biomedical applications through simple standards-based Web interfaces. Towards this end, we built a production web server (http://ws.nbcr.net) in August 2007 to support the bioinformatics application called MEME. The server has grown since to include docking analysis with AutoDock and AutoDock Vina, electrostatic calculations using PDB2PQR and APBS, and off-target analysis using SMAP. All the applications on the servers are powered by Opal, a toolkit that allows users to wrap scientific applications easily as web services without any modification to the scientific codes, by writing simple XML configuration files. Opal allows both web forms-based access and programmatic access of all our applications. The Opal toolkit currently supports SOAP-based Web service access to a number of popular applications from the National Biomedical Computation Resource (NBCR) and affiliated collaborative and service projects. In addition, Opal's programmatic access capability allows our applications to be accessed through many workflow tools, including Vision, Kepler, Nimrod/K and VisTrails. From mid-August 2007 to the end of 2009, we have successfully executed 239,814 jobs. The number of successfully executed jobs more than doubled from 205 to 411 per day between 2008 and 2009. The Opal-enabled service model is useful for a wide range of applications. It provides for interoperation with other applications with Web Service interfaces, and allows application developers to focus on the scientific tool and workflow development. Web server availability: http://ws.nbcr.net.
Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří
2013-01-01
The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
The Protein Data Bank in Europe (PDBe): bringing structure to biology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Velankar, Sameer; Kleywegt, Gerard J., E-mail: gerard@ebi.ac.uk
2011-04-01
Some future challenges for the PDB and its guardians are discussed and current and future activities in structural bioinformatics at the Protein Data Bank in Europe (PDBe) are described. The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in themore » future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described.« less
BioShaDock: a community driven bioinformatics shared Docker-based tools registry
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191
BioShaDock: a community driven bioinformatics shared Docker-based tools registry.
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.
Sanchita; Singh, Swati; Sharma, Ashok
2014-11-01
Withania somnifera (Ashwagandha) is an affluent storehouse of large number of pharmacologically active secondary metabolites known as withanolides. These secondary metabolites are produced by withanolide biosynthetic pathway. Very less information is available on structural and functional aspects of enzymes involved in withanolides biosynthetic pathways of Withiana somnifera. We therefore performed a bioinformatics analysis to look at functional and structural properties of these important enzymes. The pathway enzymes taken for this study were 3-Hydroxy-3-methylglutaryl coenzyme A reductase, 1-Deoxy-D-xylulose-5-phosphate synthase, 1-Deoxy-D-xylulose-5-phosphate reductase, farnesyl pyrophosphate synthase, squalene synthase, squalene epoxidase, and cycloartenol synthase. The prediction of secondary structure was performed for basic structural information. Three-dimensional structures for these enzymes were predicted. The physico-chemical properties such as pI, AI, GRAVY and instability index were also studied. The current information will provide a platform to know the structural attributes responsible for the function of these protein until experimental structures become available.
Deep Learning Improves Antimicrobial Peptide Recognition.
Veltri, Daniel; Kamath, Uday; Shehu, Amarda
2018-03-24
Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates. In this work we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition. Results show that the proposed model outperforms state-of-the-art classification models on a comprehensive data set. By utilizing the embedding weights, we also present a reduced-alphabet representation and show that reasonable AMP recognition can be maintained using nine amino-acid types. Models and data sets are made freely available through the Antimicrobial Peptide Scanner vr.2 web server at: www.ampscanner.com. amarda@gmu.edu for general inquiries and dan.veltri@gmail.com for web server information. Supplementary data are available at Bioinformatics online.
SLIDE - a web-based tool for interactive visualization of large-scale -omics data.
Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon
2018-06-28
Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput data sets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE's interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen. SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE. Supplementary Information are available at Bioinformatics online.
Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal
Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus
2014-01-01
The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.