Sample records for comparative structural bioinformatics

  1. Exploring DNA Structure with Cn3D

    ERIC Educational Resources Information Center

    Porter, Sandra G.; Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca

    2007-01-01

    Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology…

  2. Prospects and limitations of full-text index structures in genome analysis

    PubMed Central

    Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter

    2012-01-01

    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621

  3. The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.

    PubMed

    Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi

    2005-04-15

    Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.

  4. Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics

    ERIC Educational Resources Information Center

    Likic, Vladimir A.

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…

  5. Comparative Bioinformatic Analysis of Active Site Structures in Evolutionarily Remote Homologues of α,β-Hydrolase Superfamily Enzymes.

    PubMed

    Suplatov, D A; Arzhanik, V K; Svedas, V K

    2011-01-01

    Comparative bioinformatic analysis is the cornerstone of the study of enzymes' structure-function relationship. However, numerous enzymes that derive from a common ancestor and have undergone substantial functional alterations during natural selection appear not to have a sequence similarity acceptable for a statistically reliable comparative analysis. At the same time, their active site structures, in general, can be conserved, while other parts may largely differ. Therefore, it sounds both plausible and appealing to implement a comparative analysis of the most functionally important structural elements - the active site structures; that is, the amino acid residues involved in substrate binding and the catalytic mechanism. A computer algorithm has been developed to create a library of enzyme active site structures based on the use of the PDB database, together with programs of structural analysis and identification of functionally important amino acid residues and cavities in the enzyme structure. The proposed methodology has been used to compare some α,β-hydrolase superfamily enzymes. The insight has revealed a high structural similarity of catalytic site areas, including the conservative organization of a catalytic triad and oxyanion hole residues, despite the wide functional diversity among the remote homologues compared. The methodology can be used to compare the structural organization of the catalytic and substrate binding sites of various classes of enzymes, as well as study enzymes' evolution and to create of a databank of enzyme active site structures.

  6. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    NASA Astrophysics Data System (ADS)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  7. Ensemble-based evaluation for protein structure models.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-06-15

    Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts' intuitive assessment of computational models and provides information of practical usefulness of models. https://bitbucket.org/mjamroz/flexscore dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  8. Ensemble-based evaluation for protein structure models

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-01-01

    Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633

  9. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  10. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  11. [Bioinformatics analysis of mosquito densovirus nostructure protein NS1].

    PubMed

    Dong, Yun-qiao; Ma, Wen-li; Gu, Jin-bao; Zheng, Wen-ling

    2009-12-01

    To analyze and predict the structure and function of mosquito densovirus (MDV) nostructual protein1 (NS1). Using different bioinformatics software, the EXPASY pmtparam tool, ClustalX1.83, Bioedit, MEGA3.1, ScanProsite, and Motifscan, respectively to comparatively analyze and predict the physic-chemical parameters, homology, evolutionary relation, secondary structure and main functional motifs of NS1. MDV NS1 protein was a unstable hydrophilic protein and the amino acid sequence was highly conserved which had a relatively closer evolutionary distance with infectious hypodermal and hematopoietic necrosis virus (IHHNV). MDV NS1 has a specific domain of superfamily 3 helicase of small DNA viruses. This domain contains the NTP-binding region with a metal ion-dependent ATPase activity. A virus replication roller rolling-circle replication(RCR) initiation domain was found near the N terminal of this protein. This protien has the biological function of single stranded incision enzyme. The bioinformatics prediction results suggest that MDV NS1 protein plays a key role in viral replication, packaging, and the other stages of viral life.

  12. p3d--Python module for structural bioinformatics.

    PubMed

    Fufezan, Christian; Specht, Michael

    2009-08-21

    High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  13. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  14. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  15. WIWS: a protein structure bioinformatics Web service collection.

    PubMed

    Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G

    2010-07-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.

  16. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  17. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  18. XML schemas for common bioinformatic data types and their application in workflow systems

    PubMed Central

    Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

    2006-01-01

    Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823

  19. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  20. Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

    NASA Astrophysics Data System (ADS)

    Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

    2017-08-01

    Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.

  1. XML schemas for common bioinformatic data types and their application in workflow systems.

    PubMed

    Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

    2006-11-06

    Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.

  2. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  3. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  4. The structural bioinformatics library: modeling in biomolecular science and beyond.

    PubMed

    Cazals, Frédéric; Dreyfus, Tom

    2017-04-01

    Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. The SBL is available from http://sbl.inria.fr. Frederic.Cazals@inria.fr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  6. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  7. A bioinformatics potpourri.

    PubMed

    Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba

    2018-01-19

    The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.

  8. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  9. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  10. ballaxy: web services for structural bioinformatics.

    PubMed

    Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

    2015-01-01

    Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling.  ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  12. Robust enzyme design: bioinformatic tools for improved protein stability.

    PubMed

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  14. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  15. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  16. Chapter 16: text mining for translational bioinformatics.

    PubMed

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  17. Text mining meets workflow: linking U-Compare with Taverna

    PubMed Central

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  18. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  19. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  20. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sundstrom, J.; Tash, B; Murakami, T

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  1. Some of the most interesting CASP11 targets through the eyes of their authors.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten

    2016-09-01

    The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  2. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  3. Systematic bioinformatics and experimental validation of yeast complexes reduces the rate of attrition during structural investigations.

    PubMed

    Brooks, Mark A; Gewartowski, Kamil; Mitsiki, Eirini; Létoquart, Juliette; Pache, Roland A; Billier, Ysaline; Bertero, Michela; Corréa, Margot; Czarnocki-Cieciura, Mariusz; Dadlez, Michal; Henriot, Véronique; Lazar, Noureddine; Delbos, Lila; Lebert, Dorothée; Piwowarski, Jan; Rochaix, Pascal; Böttcher, Bettina; Serrano, Luis; Séraphin, Bertrand; van Tilbeurgh, Herman; Aloy, Patrick; Perrakis, Anastassis; Dziembowski, Andrzej

    2010-09-08

    For high-throughput structural studies of protein complexes of composition inferred from proteomics data, it is crucial that candidate complexes are selected accurately. Herein, we exemplify a procedure that combines a bioinformatics tool for complex selection with in vivo validation, to deliver structural results in a medium-throughout manner. We have selected a set of 20 yeast complexes, which were predicted to be feasible by either an automated bioinformatics algorithm, by manual inspection of primary data, or by literature searches. These complexes were validated with two straightforward and efficient biochemical assays, and heterologous expression technologies of complex components were then used to produce the complexes to assess their feasibility experimentally. Approximately one-half of the selected complexes were useful for structural studies, and we detail one particular success story. Our results underscore the importance of accurate target selection and validation in avoiding transient, unstable, or simply nonexistent complexes from the outset. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  5. CattleTickBase: An integrated Internet-based bioinformatics resource for Rhipicephalus (Boophilus) microplus

    USDA-ARS?s Scientific Manuscript database

    The Rhipicephalus microplus genome is large and complex in structure, making a genome sequence difficult to assemble and costly to resource the required bioinformatics. In light of this, a consortium of international collaborators was formed to pool resources to begin sequencing this genome. We have...

  6. Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides.

    PubMed

    Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C

    2018-06-01

    There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.

  7. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System.

    PubMed

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-11-01

    Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV.

  8. Field of genes: using Apache Kafka as a bioinformatic data repository.

    PubMed

    Lawlor, Brendan; Lynch, Richard; Mac Aogáin, Micheál; Walsh, Paul

    2018-04-01

    Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI's RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data.

  9. Perspective: Role of structure prediction in materials discovery and design

    NASA Astrophysics Data System (ADS)

    Needs, Richard J.; Pickard, Chris J.

    2016-05-01

    Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.

  10. PREFMD: a web server for protein structure refinement via molecular dynamics simulations.

    PubMed

    Heo, Lim; Feig, Michael

    2018-03-15

    Refinement of protein structure models is a long-standing problem in structural bioinformatics. Molecular dynamics-based methods have emerged as an avenue to achieve consistent refinement. The PREFMD web server implements an optimized protocol based on the method successfully tested in CASP11. Validation with recent CASP refinement targets shows consistent and more significant improvement in global structure accuracy over other state-of-the-art servers. PREFMD is freely available as a web server at http://feiglab.org/prefmd. Scripts for running PREFMD as a stand-alone package are available at https://github.com/feiglab/prefmd.git. feig@msu.edu. Supplementary data are available at Bioinformatics online.

  11. 2016 update on APBioNet's annual international conference on bioinformatics (InCoB).

    PubMed

    Schönbach, Christian; Verma, Chandra; Wee, Lawrence Jin Kiat; Bond, Peter John; Ranganathan, Shoba

    2016-12-22

    InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.

  12. Educational websites--Bioinformatics Tools II.

    PubMed

    Lomberk, Gwen

    2009-01-01

    In this issue, the highlighted websites are a continuation of a series of educational websites; this one in particular from a couple of years ago, Bioinformatics Tools [Pancreatology 2005;5:314-315]. These include sites that are valuable resources for many research needs in genomics and proteomics. Bioinformatics has become a laboratory tool to map sequences to databases, develop models of molecular interactions, evaluate structural compatibilities, describe differences between normal and disease-associated DNA, identify conserved motifs within proteins, and chart extensive signaling networks, all in silico. Copyright 2008 S. Karger AG, Basel and IAP.

  13. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  14. Bioinformatics in the orphan crops.

    PubMed

    Armstead, Ian; Huang, Lin; Ravagnani, Adriana; Robson, Paul; Ougham, Helen

    2009-11-01

    Orphan crops are those which are grown as food, animal feed or other crops of some importance in agriculture, but which have not yet received the investment of research effort or funding required to develop significant public bioinformatics resources. Where an orphan crop is related to a well-characterised model plant species, comparative genomics and bioinformatics can often, though not always, be exploited to assist research and crop improvement. This review addresses some challenges and opportunities presented by bioinformatics in the orphan crops, using three examples: forage grasses from the genera Lolium and Festuca, forage legumes and the second generation energy crop Miscanthus.

  15. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  16. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  17. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  18. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.

    PubMed

    Heo, Go Eun; Kang, Keun Young; Song, Min; Lee, Jeong-Hoon

    2017-05-31

    Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.

  19. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  20. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

    PubMed

    Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta

    2018-04-15

    Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.

  1. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  2. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  3. Field of genes: using Apache Kafka as a bioinformatic data repository

    PubMed Central

    Lynch, Richard; Walsh, Paul

    2018-01-01

    Abstract Background Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI’s) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI’s RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. Results The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Conclusions Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data. PMID:29635394

  4. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System

    PubMed Central

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-01-01

    Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379

  5. A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP

    ERIC Educational Resources Information Center

    Medin, Carey L.; Nolin, Katie L.

    2011-01-01

    Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…

  6. Improvement of Student Understanding of How Kinetic Data Facilitates the Determination of Amino Acid Catalytic Function through an Alkaline Phosphatase Structure/Mechanism Bioinformatics Exercise

    ERIC Educational Resources Information Center

    Grunwald, Sandra K.; Krueger, Katherine J.

    2008-01-01

    Laboratory exercises, which utilize alkaline phosphatase as a model enzyme, have been developed and used extensively in undergraduate biochemistry courses to illustrate enzyme steady-state kinetics. A bioinformatics laboratory exercise for the biochemistry laboratory, which complements the traditional alkaline phosphatase kinetics exercise, was…

  7. Ligation site in proteins recognized in silico

    PubMed Central

    Brylinski, Michal; Konieczny, Leszek; Roterman, Irena

    2006-01-01

    Recognition of a ligation site in a protein molecule is important for identifying its biological activity. The model for in silico recognition of ligation sites in proteins is presented. The idealized hydrophobic core stabilizing protein structure is represented by a three-dimensional Gaussian function. The experimentally observed distribution of hydrophobicity compared with the theoretical distribution reveals differences. The area of high differences indicates the ligation site. Availability http://bioinformatics.cm-uj.krakow.pl/activesite PMID:17597871

  8. A computer lab exploring evolutionary aspects of chromatin structure and dynamics for an undergraduate chromatin course*.

    PubMed

    Eirín-López, José M

    2013-01-01

    The study of chromatin constitutes one of the most active research fields in life sciences, being subject to constant revisions that continuously redefine the state of the art in its knowledge. As every other rapidly changing field, chromatin biology requires clear and straightforward educational strategies able to efficiently translate such a vast body of knowledge to the classroom. With this aim, the present work describes a multidisciplinary computer lab designed to introduce undergraduate students to the dynamic nature of chromatin, within the context of the one semester course "Chromatin: Structure, Function and Evolution." This exercise is organized in three parts including (a) molecular evolutionary biology of histone families (using the H1 family as example), (b) histone structure and variation across different animal groups, and (c) effect of histone diversity on nucleosome structure and chromatin dynamics. By using freely available bioinformatic tools that can be run on common computers, the concept of chromatin dynamics is interactively illustrated from a comparative/evolutionary perspective. At the end of this computer lab, students are able to translate the bioinformatic information into a biochemical context in which the relevance of histone primary structure on chromatin dynamics is exposed. During the last 8 years this exercise has proven to be a powerful approach for teaching chromatin structure and dynamics, allowing students a higher degree of independence during the processes of learning and self-assessment. Copyright © 2013 International Union of Biochemistry and Molecular Biology, Inc.

  9. Bioconductor: open software development for computational biology and bioinformatics

    PubMed Central

    Gentleman, Robert C; Carey, Vincent J; Bates, Douglas M; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean YH; Zhang, Jianhua

    2004-01-01

    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples. PMID:15461798

  10. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538

  11. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

    PubMed

    Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

    2005-04-07

    Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.

  12. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.

  13. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

    PubMed

    Li, Sanshu; Breaker, Ronald R

    2017-10-13

    With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.

  14. Bioinformatics in the Netherlands: the value of a nationwide community.

    PubMed

    van Gelder, Celia W G; Hooft, Rob W W; van Rijswijk, Merlijn N; van den Berg, Linda; Kok, Ruben G; Reinders, Marcel; Mons, Barend; Heringa, Jaap

    2017-09-15

    This review provides a historical overview of the inception and development of bioinformatics research in the Netherlands. Rooted in theoretical biology by foundational figures such as Paulien Hogeweg (at Utrecht University since the 1970s), the developments leading to organizational structures supporting a relatively large Dutch bioinformatics community will be reviewed. We will show that the most valuable resource that we have built over these years is the close-knit national expert community that is well engaged in basic and translational life science research programmes. The Dutch bioinformatics community is accustomed to facing the ever-changing landscape of data challenges and working towards solutions together. In addition, this community is the stable factor on the road towards sustainability, especially in times where existing funding models are challenged and change rapidly. © The Author 2017. Published by Oxford University Press.

  15. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  16. Establishing a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia.

    PubMed

    Schneider, Maria Victoria; Griffin, Philippa C; Tyagi, Sonika; Flannery, Madison; Dayalan, Saravanan; Gladman, Simon; Watson-Haigh, Nathan; Bayer, Philipp E; Charleston, Michael; Cooke, Ira; Cook, Rob; Edwards, Richard J; Edwards, David; Gorse, Dominique; McConville, Malcolm; Powell, David; Wilkins, Marc R; Lonie, Andrew

    2017-06-30

    EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article. © The Author 2017. Published by Oxford University Press.

  17. Unix interfaces, Kleisli, bucandin structure, etc. -- the heroic beginning of bioinformatics in Singapore.

    PubMed

    Eisenhaber, Frank

    2014-06-01

    Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.

  18. Convergent evolution of adenosine aptamers spanning bacterial, human, and random sequences revealed by structure-based bioinformatics and genomic SELEX

    PubMed Central

    Vu, Michael M. K.; Jameson, Nora E.; Masuda, Stuart J.; Lin, Dana; Larralde-Ridaura, Rosa; Lupták, Andrej

    2012-01-01

    SUMMARY Aptamers are structured macromolecules in vitro evolved to bind molecular targets, whereas in nature they form the ligand-binding domains of riboswitches. Adenosine aptamers of a single structural family were isolated several times from random pools but they have not been identified in genomic sequences. We used two unbiased methods, structure-based bioinformatics and human genome-based in vitro selection, to identify aptamers that form the same adenosine-binding structure in a bacterium, and several vertebrates, including humans. Two of the human aptamers map to introns of RAB3C and FGD3 genes. The RAB3C aptamer binds ATP with dissociation constants about ten times lower than physiological ATP concentration, while the minimal FGD3 aptamer binds ATP only co-transcriptionally. PMID:23102219

  19. Entropyology: the application of bioinformatics and data modeling to digital virus and malware recognition

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger M.; Handley, James W.

    2010-04-01

    Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.

  20. Fast large-scale clustering of protein structures using Gauss integrals.

    PubMed

    Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas

    2012-02-15

    Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

  1. Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

    PubMed Central

    Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

    2013-01-01

    The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333

  2. The Protein Data Bank in Europe (PDBe): bringing structure to biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Velankar, Sameer; Kleywegt, Gerard J., E-mail: gerard@ebi.ac.uk

    2011-04-01

    Some future challenges for the PDB and its guardians are discussed and current and future activities in structural bioinformatics at the Protein Data Bank in Europe (PDBe) are described. The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in themore » future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described.« less

  3. BioShaDock: a community driven bioinformatics shared Docker-based tools registry

    PubMed Central

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191

  4. BioShaDock: a community driven bioinformatics shared Docker-based tools registry.

    PubMed

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.

  5. Zebra: a web server for bioinformatic analysis of diverse protein families.

    PubMed

    Suplatov, Dmitry; Kirilin, Evgeny; Takhaveev, Vakil; Svedas, Vytas

    2014-01-01

    During evolution of proteins from a common ancestor, one functional property can be preserved while others can vary leading to functional diversity. A systematic study of the corresponding adaptive mutations provides a key to one of the most challenging problems of modern structural biology - understanding the impact of amino acid substitutions on protein function. The subfamily-specific positions (SSPs) are conserved within functional subfamilies but are different between them and, therefore, seem to be responsible for functional diversity in protein superfamilies. Consequently, a corresponding method to perform the bioinformatic analysis of sequence and structural data has to be implemented in the common laboratory practice to study the structure-function relationship in proteins and develop novel protein engineering strategies. This paper describes Zebra web server - a powerful remote platform that implements a novel bioinformatic analysis algorithm to study diverse protein families. It is the first application that provides specificity determinants at different levels of functional classification, therefore addressing complex functional diversity of large superfamilies. Statistical analysis is implemented to automatically select a set of highly significant SSPs to be used as hotspots for directed evolution or rational design experiments and analyzed studying the structure-function relationship. Zebra results are provided in two ways - (1) as a single all-in-one parsable text file and (2) as PyMol sessions with structural representation of SSPs. Zebra web server is available at http://biokinet.belozersky.msu.ru/zebra .

  6. PconsD: ultra rapid, accurate model quality assessment for protein structure prediction.

    PubMed

    Skwark, Marcin J; Elofsson, Arne

    2013-07-15

    Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy. The source code for PconsD is freely available at http://d.pcons.net/. Supplementary benchmarking data are also available there. arne@bioinfo.se Supplementary data are available at Bioinformatics online.

  7. Bioinformatics approaches for structural and functional analysis of proteins in secondary metabolism in Withania somnifera.

    PubMed

    Sanchita; Singh, Swati; Sharma, Ashok

    2014-11-01

    Withania somnifera (Ashwagandha) is an affluent storehouse of large number of pharmacologically active secondary metabolites known as withanolides. These secondary metabolites are produced by withanolide biosynthetic pathway. Very less information is available on structural and functional aspects of enzymes involved in withanolides biosynthetic pathways of Withiana somnifera. We therefore performed a bioinformatics analysis to look at functional and structural properties of these important enzymes. The pathway enzymes taken for this study were 3-Hydroxy-3-methylglutaryl coenzyme A reductase, 1-Deoxy-D-xylulose-5-phosphate synthase, 1-Deoxy-D-xylulose-5-phosphate reductase, farnesyl pyrophosphate synthase, squalene synthase, squalene epoxidase, and cycloartenol synthase. The prediction of secondary structure was performed for basic structural information. Three-dimensional structures for these enzymes were predicted. The physico-chemical properties such as pI, AI, GRAVY and instability index were also studied. The current information will provide a platform to know the structural attributes responsible for the function of these protein until experimental structures become available.

  8. Extracting patterns of database and software usage from the bioinformatics literature

    PubMed Central

    Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert

    2014-01-01

    Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253

  9. Gaia: automated quality assessment of protein structure models.

    PubMed

    Kota, Pradeep; Ding, Feng; Ramachandran, Srinivas; Dokholyan, Nikolay V

    2011-08-15

    Increasing use of structural modeling for understanding structure-function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures. In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement. We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures. dokh@unc.edu Supplementary data are available at Bioinformatics online.

  10. 3DSDSCAR--a three dimensional structural database for sialic acid-containing carbohydrates through molecular dynamics simulation.

    PubMed

    Veluraja, Kasinadar; Selvin, Jeyasigamani F A; Venkateshwari, Selvakumar; Priyadarzini, Thanu R K

    2010-09-23

    The inherent flexibility and lack of strong intramolecular interactions of oligosaccharides demand the use of theoretical methods for their structural elucidation. In spite of the developments of theoretical methods, not much research on glycoinformatics is done so far when compared to bioinformatics research on proteins and nucleic acids. We have developed three dimensional structural database for a sialic acid-containing carbohydrates (3DSDSCAR). This is an open-access database that provides 3D structural models of a given sialic acid-containing carbohydrate. At present, 3DSDSCAR contains 60 conformational models, belonging to 14 different sialic acid-containing carbohydrates, deduced through 10 ns molecular dynamics (MD) simulations. The database is available at the URL: http://www.3dsdscar.org. Copyright 2010 Elsevier Ltd. All rights reserved.

  11. Mathematics and evolutionary biology make bioinformatics education comprehensible.

    PubMed

    Jungck, John R; Weisstein, Anton E

    2013-09-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.

  12. Mathematics and evolutionary biology make bioinformatics education comprehensible

    PubMed Central

    Weisstein, Anton E.

    2013-01-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621

  13. Unsupervised learning of natural languages

    PubMed Central

    Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

    2005-01-01

    We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885

  14. Unsupervised learning of natural languages.

    PubMed

    Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

    2005-08-16

    We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

  15. G2S: a web-service for annotating genomic variants on 3D protein structures.

    PubMed

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-06-01

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  16. Bioinformatics data distribution and integration via Web Services and XML.

    PubMed

    Li, Xiao; Zhang, Yizheng

    2003-11-01

    It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biology data is not solved satisfactorily. The eXtensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration.

  17. Adapting bioinformatics curricula for big data.

    PubMed

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. © The Author 2015. Published by Oxford University Press.

  18. Adapting bioinformatics curricula for big data

    PubMed Central

    Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  19. CAMBerVis: visualization software to support comparative analysis of multiple bacterial strains.

    PubMed

    Woźniak, Michał; Wong, Limsoon; Tiuryn, Jerzy

    2011-12-01

    A number of inconsistencies in genome annotations are documented among bacterial strains. Visualization of the differences may help biologists to make correct decisions in spurious cases. We have developed a visualization tool, CAMBerVis, to support comparative analysis of multiple bacterial strains. The software manages simultaneous visualization of multiple bacterial genomes, enabling visual analysis focused on genome structure annotations. The CAMBerVis software is freely available at the project website: http://bioputer.mimuw.edu.pl/camber. Input datasets for Mycobacterium tuberculosis and Staphylocacus aureus are integrated with the software as examples. m.wozniak@mimuw.edu.pl Supplementary data are available at Bioinformatics online.

  20. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    PubMed

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.

  1. Detection of novel coupled mutations in the katG gene (His276Met, Gln295His and Ser315Thr) in a multidrug-resistant Mycobacterium tuberculosis strain from Chennai, India, and insight into the molecular mechanism of isoniazid resistance using structural bioinformatics approaches.

    PubMed

    Ramasubban, Gayathri; Therese, Kulandai Lily; Vetrivel, Umashankar; Sivashanmugam, Muthukumaran; Rajan, Parvathy; Sridhar, R; Madhavan, Hajib N; Meenakshi, N

    2011-04-01

    This study reports on the structural basis of drug resistance targeting the katG gene in a multidrug-resistant Mycobacterium tuberculosis (MDR-TB) strain with two novel mutations (His276Met and Gln295His) in addition to the most commonly reported mutation (Ser315Thr). A structural bioinformatics approach was used to predict the structure of the mutant KatG enzyme (MT). Subsequent molecular dynamics and docking studies were performed to explain the mechanism of isoniazid (INH) resistance. The results show significant conformational changes in the structure of MT leading to a change in INH binding residues at the active site, with a significant increase in the inhibition constant (Ki) of 5.67 μm in the mutant KatG-isoniazid complex (MT-INH) compared with the wild-type KatG-isoniazid complex (WT-INH). In the case of molecular dynamics studies, root mean square deviation (RMSD) analysis of the protein backbone in simulated biological conditions revealed an unstable trajectory with higher deviations in MT throughout the simulation process (1 ns). Moreover, root mean square fluctuation (RMSF) analysis revealed an overall increase in residual fluctuations in MT compared with the wild-type KatG enzyme (WT), whilst the INH binding residues of MT showed a decreased fluctuation that can be observed as peak deviations. Hence, the present study suggests that His276Met, Gln295His and Ser315Thr mutations targeting the katG gene result in decreased stability and flexibility of the protein at INH binding residues leading to impaired enzyme function. Copyright © 2010 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.

  2. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Can all heritable biology really be reduced to a single dimension?

    PubMed

    Babbitt, Gregory A; Coppola, Erin E; Alawad, Mohammed A; Hudson, André O

    2016-03-10

    A long-held presupposition in the field of bioinformatics holds that genetic, and now even epigenetic 'information' can be abstracted from the physicochemical details of the macromolecular polymers in which it resides. It is perhaps rather ironic that this basic conjecture originated upon the first observations of DNA structure itself. This static model of DNA led very quickly to the conclusion that only the nucleobase sequence itself is rich enough in molecular complexity to replicate a complex biology. This idea has been pervasive throughout genomic science, higher education and popular culture ever since; to the point that most of us would accept it unquestioningly as fact. What is more alarming is that this conjecture is driving a significant portion of the technological development in modern genomics towards methods strongly rooted in DNA sequencing, thereby reducing a dynamic multi-dimensional biology into single-dimensional forms of data. Evidence countering this central tenet of bioinformatics has been quietly mounting over many decades, prompting some to propose that the genome must be studied from the perspective of its molecular reality, rather than as a body of information to be represented symbolically. Here, we explore the epistemological boundary between bioinformatics and molecular biology, and warn against an 'overtly' bioinformatic perspective. We review a selection of new bioinformatic methods that move beyond sequence-based approaches to include consideration of databased three dimensional structures. However, we also note that these hybrid methods still ignore the most important element of gene function when attempting to improve outcomes; the fourth dimension of molecular dynamics over time. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Bioinformatics analysis of single and multi-hybrid epitopes of GRA-1, GRA-4, GRA-6 and GRA-7 proteins to improve DNA vaccine design against Toxoplasma gondii.

    PubMed

    Shaddel, Minoo; Ebrahimi, Mansour; Tabandeh, Mohammad Reza

    2018-06-01

    Toxoplasma gondii , is a causative agent of morbidity and mortality in immunocompromised and congenitally-infected individuals. Attempts to construct DNA vaccines against T. gondii using surface proteins are increasing. The dense granule antigens are highly expressed in the acute and chronic phases of T. gondii infection and considered as suitable DNA vaccine candidates to control toxoplasmosis. In the present study, bioinformatics tools and online software were used to predict, analyze and compare the structural, physical and chemical characters and immunogenicity of the GRA-1, GRA-4, GRA-6 and GRA-7 proteins. Sequence alignment results indicated that the GRA-1, GRA-4, GRA-6 and GRA-7 proteins had low similarity. The secondary structure prediction demonstrated that among the four proteins, GRA-1 and GRA-6 had similar secondary structure except for a little discrepancy. Hydrophilicity/hydrophobicity analysis showed multiple hydrophilic regions and some classical high hydrophilic domains for each protein sequence. Immunogenic epitope prediction results demonstrated that the GRA-1 and GRA-4 epitopes were stable and GRA-4 showed the highest degree of antigenicity. Although the GRA-7 epitope had the highest score of immunogenicity, this epitope was instable and had the lowest degree of antigenicity and half-time in eukaryotic cell. Also, the results indicated that GRA4-GRA7 epitope and GRA6-GRA7 had the highest degree of antigenicity and immunogenicity among multi-hybrid epitopes, respectively. Totally, in the present study, single epitopes showed the highest degree of antigenicity compared with multi-hybrid epitopes. Given the results, it can be concluded that GRA-4 and GRA-7 can be powerful DNA vaccine candidates against T. gondii .

  5. A review of bioinformatics platforms for comparative genomics. Recent developments of the EDGAR 2.0 platform and its utility for taxonomic and phylogenetic studies.

    PubMed

    Yu, J; Blom, J; Glaeser, S P; Jaenicke, S; Juhre, T; Rupp, O; Schwengers, O; Spänig, S; Goesmann, A

    2017-11-10

    The rapid development of next generation sequencing technology has greatly increased the amount of available microbial genomes. As a result of this development, there is a rising demand for fast and automated approaches in analyzing these genomes in a comparative way. Whole genome sequencing also bears a huge potential for obtaining a higher resolution in phylogenetic and taxonomic classification. During the last decade, several software tools and platforms have been developed in the field of comparative genomics. In this manuscript, we review the most commonly used platforms and approaches for ortholog group analyses with a focus on their potential for phylogenetic and taxonomic research. Furthermore, we describe the latest improvements of the EDGAR platform for comparative genome analyses and present recent examples of its application for the phylogenomic analysis of different taxa. Finally, we illustrate the role of the EDGAR platform as part of the BiGi Center for Microbial Bioinformatics within the German network on Bioinformatics Infrastructure (de.NBI). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  7. Pathgroups, a dynamic data structure for genome reconstruction problems.

    PubMed

    Zheng, Chunfang

    2010-07-01

    Ancestral gene order reconstruction problems, including the median problem, quartet construction, small phylogeny, guided genome halving and genome aliquoting, are NP hard. Available heuristics dedicated to each of these problems are computationally costly for even small instances. We present a data structure enabling rapid heuristic solution to all these ancestral genome reconstruction problems. A generic greedy algorithm with look-ahead based on an automatically generated priority system suffices for all the problems using this data structure. The efficiency of the algorithm is due to fast updating of the structure during run time and to the simplicity of the priority scheme. We illustrate with the first rapid algorithm for quartet construction and apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny. http://albuquerque.bioinformatics.uottawa.ca/pathgroup/Quartet.html chunfang313@gmail.com Supplementary data are available at Bioinformatics online.

  8. Bioinformatics and molecular modeling in glycobiology

    PubMed Central

    Schloissnig, Siegfried

    2010-01-01

    The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395

  9. Bioinformatic Analysis of the Contribution of Primer Sequences to Aptamer Structures

    PubMed Central

    Ellington, Andrew D.

    2009-01-01

    Aptamers are nucleic acid molecules selected in vitro to bind a particular ligand. While numerous experimental studies have examined the sequences, structures, and functions of individual aptamers, considerably fewer studies have applied bioinformatics approaches to try to infer more general principles from these individual studies. We have used a large Aptamer Database to parse the contributions of both random and constant regions to the secondary structures of more than 2000 aptamers. We find that the constant, primer-binding regions do not, in general, contribute significantly to aptamer structures. These results suggest that (a) binding function is not contributed to nor constrained by constant regions; (b) in consequence, the landscape of functional binding sequences is sparse but robust, favoring scenarios for short, functional nucleic acid sequences near origins; and (c) many pool designs for the selection of aptamers are likely to prove robust. PMID:18594898

  10. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    PubMed

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  11. pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families.

    PubMed

    Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Svedas, Vytas

    2014-07-01

    The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure-function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. RealityConvert: a tool for preparing 3D models of biochemical structures for augmented and virtual reality.

    PubMed

    Borrel, Alexandre; Fourches, Denis

    2017-12-01

    There is a growing interest for the broad use of Augmented Reality (AR) and Virtual Reality (VR) in the fields of bioinformatics and cheminformatics to visualize complex biological and chemical structures. AR and VR technologies allow for stunning and immersive experiences, offering untapped opportunities for both research and education purposes. However, preparing 3D models ready to use for AR and VR is time-consuming and requires a technical expertise that severely limits the development of new contents of potential interest for structural biologists, medicinal chemists, molecular modellers and teachers. Herein we present the RealityConvert software tool and associated website, which allow users to easily convert molecular objects to high quality 3D models directly compatible for AR and VR applications. For chemical structures, in addition to the 3D model generation, RealityConvert also generates image trackers, useful to universally call and anchor that particular 3D model when used in AR applications. The ultimate goal of RealityConvert is to facilitate and boost the development and accessibility of AR and VR contents for bioinformatics and cheminformatics applications. http://www.realityconvert.com. dfourch@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Statistical inference of protein structural alignments using information and compression.

    PubMed

    Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S

    2017-04-01

    Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. Bioinformatics for transporter pharmacogenomics and systems biology: data integration and modeling with UML.

    PubMed

    Yan, Qing

    2010-01-01

    Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.

  15. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  16. Computational Lipidomics and Lipid Bioinformatics: Filling In the Blanks.

    PubMed

    Pauling, Josch; Klipp, Edda

    2016-12-22

    Lipids are highly diverse metabolites of pronounced importance in health and disease. While metabolomics is a broad field under the omics umbrella that may also relate to lipids, lipidomics is an emerging field which specializes in the identification, quantification and functional interpretation of complex lipidomes. Today, it is possible to identify and distinguish lipids in a high-resolution, high-throughput manner and simultaneously with a lot of structural detail. However, doing so may produce thousands of mass spectra in a single experiment which has created a high demand for specialized computational support to analyze these spectral libraries. The computational biology and bioinformatics community has so far established methodology in genomics, transcriptomics and proteomics but there are many (combinatorial) challenges when it comes to structural diversity of lipids and their identification, quantification and interpretation. This review gives an overview and outlook on lipidomics research and illustrates ongoing computational and bioinformatics efforts. These efforts are important and necessary steps to advance the lipidomics field alongside analytic, biochemistry, biomedical and biology communities and to close the gap in available computational methodology between lipidomics and other omics sub-branches.

  17. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  18. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  19. pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families

    PubMed Central

    Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas

    2014-01-01

    The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248

  20. eSBMTools 1.0: enhanced native structure-based modeling tools.

    PubMed

    Lutz, Benjamin; Sinner, Claude; Heuermann, Geertje; Verma, Abhinav; Schug, Alexander

    2013-11-01

    Molecular dynamics simulations provide detailed insights into the structure and function of biomolecular systems. Thus, they complement experimental measurements by giving access to experimentally inaccessible regimes. Among the different molecular dynamics techniques, native structure-based models (SBMs) are based on energy landscape theory and the principle of minimal frustration. Typically used in protein and RNA folding simulations, they coarse-grain the biomolecular system and/or simplify the Hamiltonian resulting in modest computational requirements while achieving high agreement with experimental data. eSBMTools streamlines running and evaluating SBM in a comprehensive package and offers high flexibility in adding experimental- or bioinformatics-derived restraints. We present a software package that allows setting up, modifying and evaluating SBM for both RNA and proteins. The implemented workflows include predicting protein complexes based on bioinformatics-derived inter-protein contact information, a standardized setup of protein folding simulations based on the common PDB format, calculating reaction coordinates and evaluating the simulation by free-energy calculations with weighted histogram analysis method or by phi-values. The modules interface with the molecular dynamics simulation program GROMACS. The package is open source and written in architecture-independent Python2. http://sourceforge.net/projects/esbmtools/. alexander.schug@kit.edu. Supplementary data are available at Bioinformatics online.

  1. Current challenges in genome annotation through structural biology and bioinformatics.

    PubMed

    Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M

    2012-10-01

    With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.

  2. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  3. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  4. Computational prediction and biochemical characterization of novel RNA aptamers to Rift Valley fever virus nucleocapsid protein.

    PubMed

    Ellenbecker, Mary; St Goddard, Jeremy; Sundet, Alec; Lanchy, Jean-Marc; Raiford, Douglas; Lodmell, J Stephen

    2015-10-01

    Rift Valley fever virus (RVFV) is a potent human and livestock pathogen endemic to sub-Saharan Africa and the Arabian Peninsula that has potential to spread to other parts of the world. Although there is no proven effective and safe treatment for RVFV infections, a potential therapeutic target is the virally encoded nucleocapsid protein (N). During the course of infection, N binds to viral RNA, and perturbation of this interaction can inhibit viral replication. To gain insight into how N recognizes viral RNA specifically, we designed an algorithm that uses a distance matrix and multidimensional scaling to compare the predicted secondary structures of known N-binding RNAs, or aptamers, that were isolated and characterized in previous in vitro evolution experiment. These aptamers did not exhibit overt sequence or predicted structure similarity, so we employed bioinformatic methods to propose novel aptamers based on analysis and clustering of secondary structures. We screened and scored the predicted secondary structures of novel randomly generated RNA sequences in silico and selected several of these putative N-binding RNAs whose secondary structures were similar to those of known N-binding RNAs. We found that overall the in silico generated RNA sequences bound well to N in vitro. Furthermore, introduction of these RNAs into cells prior to infection with RVFV inhibited viral replication in cell culture. This proof of concept study demonstrates how the predictive power of bioinformatics and the empirical power of biochemistry can be jointly harnessed to discover, synthesize, and test new RNA sequences that bind tightly to RVFV N protein. The approach would be easily generalizable to other applications. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Human, vector and parasite Hsp90 proteins: A comparative bioinformatics analysis.

    PubMed

    Faya, Ngonidzashe; Penkler, David L; Tastan Bishop, Özlem

    2015-01-01

    The treatment of protozoan parasitic diseases is challenging, and thus identification and analysis of new drug targets is important. Parasites survive within host organisms, and some need intermediate hosts to complete their life cycle. Changing host environment puts stress on parasites, and often adaptation is accompanied by the expression of large amounts of heat shock proteins (Hsps). Among Hsps, Hsp90 proteins play an important role in stress environments. Yet, there has been little computational research on Hsp90 proteins to analyze them comparatively as potential parasitic drug targets. Here, an attempt was made to gain detailed insights into the differences between host, vector and parasitic Hsp90 proteins by large-scale bioinformatics analysis. A total of 104 Hsp90 sequences were divided into three groups based on their cellular localizations; namely cytosolic, mitochondrial and endoplasmic reticulum (ER). Further, the parasitic proteins were divided according to the type of parasite (protozoa, helminth and ectoparasite). Primary sequence analysis, phylogenetic tree calculations, motif analysis and physicochemical properties of Hsp90 proteins suggested that despite the overall structural conservation of these proteins, parasitic Hsp90 proteins have unique features which differentiate them from human ones, thus encouraging the idea that protozoan Hsp90 proteins should be further analyzed as potential drug targets.

  6. Achievements and challenges in structural bioinformatics and computational biophysics.

    PubMed

    Samish, Ilan; Bourne, Philip E; Najmanovich, Rafael J

    2015-01-01

    The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects. An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results. The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field. © The Author 2014. Published by Oxford University Press.

  7. Achievements and challenges in structural bioinformatics and computational biophysics

    PubMed Central

    Samish, Ilan; Bourne, Philip E.; Najmanovich, Rafael J.

    2015-01-01

    Motivation: The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects. Results: An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results. Conclusion: The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field. Contact: Rafael.Najmanovich@USherbrooke.ca PMID:25488929

  8. Discovery of an Unexplored Protein Structural Scaffold of Serine Protease from Big Blue Octopus (Octopus cyanea): A New Prospective Lead Molecule.

    PubMed

    Panda, Subhamay; Kumari, Leena

    2017-01-01

    Serine proteases are a group of enzymes that hydrolyses the peptide bonds in proteins. In mammals, these enzymes help in the regulation of several major physiological functions such as digestion, blood clotting, responses of immune system, reproductive functions and the complement system. Serine proteases obtained from the venom of Octopodidae family is a relatively unexplored area of research. In the present work, we tried to effectively utilize comparative composite molecular modeling technique. Our key aim was to propose the first molecular model structure of unexplored serine protease 5 derived from big blue octopus. The other objective of this study was to analyze the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the aid of different bioinformatic tools. In the present study, molecular model has been generated with the help of I-TASSER suite. Afterwards the refined structural model was validated with standard methods. For functional annotation of protein molecule we used Protein Information Resource (PIR) database. Serine protease 5 of big blue octopus was analyzed with different bioinformatical algorithms for the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) were determined using the COACH program. The molecular model data in cooperation to other pertinent post model analysis data put forward molecular insight to proteolytic activity of serine protease 5, which helps in the clear understanding of procoagulant and anticoagulant characteristics of this natural lead molecule. Our approach was to investigate the octopus venom protein as a whole or a part of their structure that may result in the development of new lead molecule. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. PLI: a web-based tool for the comparison of protein-ligand interactions observed on PDB structures.

    PubMed

    Gallina, Anna Maria; Bisignano, Paola; Bergamino, Maurizio; Bordo, Domenico

    2013-02-01

    A large fraction of the entries contained in the Protein Data Bank describe proteins in complex with low molecular weight molecules such as physiological compounds or synthetic drugs. In many cases, the same molecule is found in distinct protein-ligand complexes. There is an increasing interest in Medicinal Chemistry in comparing protein binding sites to get insight on interactions that modulate the binding specificity, as this structural information can be correlated with other experimental data of biochemical or physiological nature and may help in rational drug design. The web service protein-ligand interaction presented here provides a tool to analyse and compare the binding pockets of homologous proteins in complex with a selected ligand. The information is deduced from protein-ligand complexes present in the Protein Data Bank and stored in the underlying database. Freely accessible at http://bioinformatics.istge.it/pli/.

  10. A high level interface to SCOP and ASTRAL implemented in python.

    PubMed

    Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S

    2006-01-10

    Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.

  11. StrBioLib: a Java library for development of custom computational structural biology applications.

    PubMed

    Chandonia, John-Marc

    2007-08-01

    StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/

  12. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans.

    PubMed

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C G; Benavente, Ricardo

    2012-10-09

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed no sequence homology. This discrepancy challenged the hypothesis that the SC arose only once in evolution. To pursue this matter we focused on the evolution of SYCP1 and SYCP3, the two major structural SC proteins of mammals. Remarkably, our comparative bioinformatic and expression studies revealed that SYCP1 and SYCP3 are also components of the SC in the basal metazoan Hydra. In contrast to previous assumptions, we therefore conclude that SYCP1 and SYCP3 form monophyletic groups of orthologous proteins across metazoans.

  13. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans

    PubMed Central

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C. G.; Benavente, Ricardo

    2012-01-01

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed no sequence homology. This discrepancy challenged the hypothesis that the SC arose only once in evolution. To pursue this matter we focused on the evolution of SYCP1 and SYCP3, the two major structural SC proteins of mammals. Remarkably, our comparative bioinformatic and expression studies revealed that SYCP1 and SYCP3 are also components of the SC in the basal metazoan Hydra. In contrast to previous assumptions, we therefore conclude that SYCP1 and SYCP3 form monophyletic groups of orthologous proteins across metazoans. PMID:23012415

  14. Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wati, R.

    2018-03-01

    The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.

  15. Biopython: freely available Python tools for computational molecular biology and bioinformatics.

    PubMed

    Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

    2009-06-01

    The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

  16. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

    PubMed

    Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-07-01

    Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.

  17. Bioinformatics: Cheap and robust method to explore biomaterial from Indonesia biodiversity

    NASA Astrophysics Data System (ADS)

    Widodo

    2015-02-01

    Indonesia has a huge amount of biodiversity, which may contain many biomaterials for pharmaceutical application. These resources potency should be explored to discover new drugs for human wealth. However, the bioactive screening using conventional methods is very expensive and time-consuming. Therefore, we developed a methodology for screening the potential of natural resources based on bioinformatics. The method is developed based on the fact that organisms in the same taxon will have similar genes, metabolism and secondary metabolites product. Then we employ bioinformatics to explore the potency of biomaterial from Indonesia biodiversity by comparing species with the well-known taxon containing the active compound through published paper or chemical database. Then we analyze drug-likeness, bioactivity and the target proteins of the active compound based on their molecular structure. The target protein was examined their interaction with other proteins in the cell to determine action mechanism of the active compounds in the cellular level, as well as to predict its side effects and toxicity. By using this method, we succeeded to screen anti-cancer, immunomodulators and anti-inflammation from Indonesia biodiversity. For example, we found anticancer from marine invertebrate by employing the method. The anti-cancer was explore based on the isolated compounds of marine invertebrate from published article and database, and then identified the protein target, followed by molecular pathway analysis. The data suggested that the active compound of the invertebrate able to kill cancer cell. Further, we collect and extract the active compound from the invertebrate, and then examined the activity on cancer cell (MCF7). The MTT result showed that the methanol extract of marine invertebrate was highly potent in killing MCF7 cells. Therefore, we concluded that bioinformatics is cheap and robust way to explore bioactive from Indonesia biodiversity for source of drug and another pharmaceutical material.

  18. Structural insights into a key carotenogenesis related enzyme phytoene synthase of P. falciparum: a novel drug target for malaria.

    PubMed

    Agarwal, Shalini; Sharma, Vijeta; Phulera, Swastik; Abdin, M Z; Ayana, R; Singh, Shailja

    2015-12-01

    Carotenoids represent a diverse group of pigments derived from the common isoprenoid precursors and fulfill a variety of critical functions in plants and animals. Phytoene synthase (PSY), a transferase enzyme that catalyzes the first specific step in carotenoid biosynthesis plays a central role in the regulation of a number of essential functions mediated via carotenoids. PSYs have been deeply investigated in plants, bacteria and algae however in apicomplexans it is poorly studied. In an effort to characterize PSY in apicomplexans especially the malaria parasite Plasmodium falciparum (P. falciparum), a detailed bioinformatics analysis is undertaken. We have analysed the Phylogenetic relationship of PSY also referred to as octaprenyl pyrophosphate synthase (OPPS) in P. falciparum with other taxonomic groups. Further, we in silico characterized the secondary and tertiary structures of P. falciparum PSY/OPPS and compared the tertiary structures with crystal structure of Thermotoga maritima (T. maritima) OPPS. Our results evidenced the resemblance of P. falciparum PSY with the active site of T. maritima OPPS. Interestingly, the comparative structural analysis revealed an unconserved unique loop in P. falciparum OPPS/PSY. Such structural insights might contribute novel accessory functions to the protein thus, offering potential drug targets.

  19. Unipro UGENE: a unified bioinformatics toolkit.

    PubMed

    Okonechnikov, Konstantin; Golosova, Olga; Fursov, Mikhail

    2012-04-15

    Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality. UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.

  20. Current and future methods for evaluating the allergenic potential of proteins: international workshop report 23-25 October 2007.

    PubMed

    Thomas, Karluss; Herouet-Guicheney, Corinne; Ladics, Gregory; McClain, Scott; MacIntosh, Susan; Privalle, Laura; Woolhiser, Mike

    2008-09-01

    The International Life Science Institute's Health and Environmental Sciences Institute's Protein Allergenicity Technical Committee hosted an international workshop October 23-25, 2007, in Nice, France, to review and discuss existing and emerging methods and techniques for improving the current weight-of-evidence approach for evaluating the potential allergenicity of novel proteins. The workshop included over 40 international experts from government, industry, and academia. Their expertise represented a range of disciplines including immunology, chemistry, molecular biology, bioinformatics, and toxicology. Among participants, there was consensus that (1) current bioinformatic approaches are highly conservative; (2) advances in bioinformatics using structural comparisons of proteins may be helpful as the availability of structural data increases; (3) proteomics may prove useful for monitoring the natural variability in a plant's proteome and assessing the impact of biotechnology transformations on endogenous levels of allergens, but only when analytical techniques have been standardized and additional data are available on the natural variation of protein expression in non-transgenic bred plants; (4) basophil response assays are promising techniques, but need additional evaluation around specificity, sensitivity, and reproducibility; (5) additional research is required to develop and validate an animal model for the purpose of predicting protein allergenicity.

  1. Findings

    MedlinePlus

    ... Issue All Issues Explore Findings by Topic Cell Biology Cellular Structures, Functions, Processes, Imaging, Stress Response Chemistry ... Glycobiology, Synthesis, Natural Products, Chemical Reactions Computers in Biology Bioinformatics, Modeling, Systems Biology, Data Visualization Diseases Cancer, ...

  2. CBESW: sequence alignment on the Playstation 3.

    PubMed

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-09-17

    The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  3. CBESW: Sequence Alignment on the Playstation 3

    PubMed Central

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-01-01

    Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications. PMID:18798993

  4. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  5. Learning nucleic acids solving by bioinformatics problems.

    PubMed

    Nunes, Rhewter; Barbosa de Almeida Júnior, Edivaldo; Pessoa Pinto de Menezes, Ivandilson; Malafaia, Guilherme

    2015-01-01

    The article describes the development of a new approach to teach molecular biology to undergraduate biology students. The 34 students who participated in this research belonged to the first period of the Biological Sciences teaching course of the Instituto Federal Goiano at Urutaí Campus, Brazil. They were registered in Cell Biology in the first semester of 2013. They received four 55 min-long expository/dialogued lectures that covered the content of "structure and functions of nucleic acids". Later the students were invited to attend four meetings (in a computer laboratory) in which some concepts of Bioinformatics were presented and some problems of the Rosalind platform were solved. The observations we report here are very useful as a broad groundwork to development new research. An interesting possibility is research into the effects of bioinformatics interventions that improve molecular biology learning. © 2015 The International Union of Biochemistry and Molecular Biology.

  6. Freshwater Metaviromics and Bacteriophages: A Current Assessment of the State of the Art in Relation to Bioinformatic Challenges

    PubMed Central

    Bruder, Katherine; Malki, Kema; Cooper, Alexandria; Sible, Emily; Shapiro, Jason W.; Watkins, Siobhan C.; Putonti, Catherine

    2016-01-01

    Advances in bioinformatics and sequencing technologies have allowed for the analysis of complex microbial communities at an unprecedented rate. While much focus is often placed on the cellular members of these communities, viruses play a pivotal role, particularly bacteria-infecting viruses (bacteriophages); phages mediate global biogeochemical processes and drive microbial evolution through bacterial grazing and horizontal gene transfer. Despite their importance and ubiquity in nature, very little is known about the diversity and structure of viral communities. Though the need for culture-based methods for viral identification has been somewhat circumvented through metagenomic techniques, the analysis of metaviromic data is marred with many unique issues. In this review, we examine the current bioinformatic approaches for metavirome analyses and the inherent challenges facing the field as illustrated by the ongoing efforts in the exploration of freshwater phage populations. PMID:27375355

  7. Bioinformatics Meets Virology: The European Virus Bioinformatics Center's Second Annual Meeting.

    PubMed

    Ibrahim, Bashar; Arkhipova, Ksenia; Andeweg, Arno C; Posada-Céspedes, Susana; Enault, François; Gruber, Arthur; Koonin, Eugene V; Kupczok, Anne; Lemey, Philippe; McHardy, Alice C; McMahon, Dino P; Pickett, Brett E; Robertson, David L; Scheuermann, Richard H; Zhernakova, Alexandra; Zwart, Mark P; Schönhuth, Alexander; Dutilh, Bas E; Marz, Manja

    2018-05-14

    The Second Annual Meeting of the European Virus Bioinformatics Center (EVBC), held in Utrecht, Netherlands, focused on computational approaches in virology, with topics including (but not limited to) virus discovery, diagnostics, (meta-)genomics, modeling, epidemiology, molecular structure, evolution, and viral ecology. The goals of the Second Annual Meeting were threefold: (i) to bring together virologists and bioinformaticians from across the academic, industrial, professional, and training sectors to share best practice; (ii) to provide a meaningful and interactive scientific environment to promote discussion and collaboration between students, postdoctoral fellows, and both new and established investigators; (iii) to inspire and suggest new research directions and questions. Approximately 120 researchers from around the world attended the Second Annual Meeting of the EVBC this year, including 15 renowned international speakers. This report presents an overview of new developments and novel research findings that emerged during the meeting.

  8. Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE.

    PubMed

    Huang, Xin; Li, Hao-ming

    2009-08-05

    Lovastatin is an effective drug for treatment of hyperlipidemia. This study aimed to clone lovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein. According to the lovastatin synthase gene sequence from genebank, primers were designed to amplify and clone the lovastatin biosynthesis regulatory gene lovE from Aspergillus terrus genomic DNA. Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through internet resources and software like DNAMAN. Target fragment lovE, almost 1500 bp in length, was amplified from Aspergillus terrus genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted. In the lovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-like transcriptional factor.

  9. KBWS: an EMBOSS associated package for accessing bioinformatics web services.

    PubMed

    Oshita, Kazuki; Arakawa, Kazuharu; Tomita, Masaru

    2011-04-29

    The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS) UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS), adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded) and http://soap.g-language.org/kbws_dl.wsdl (Document/literal).

  10. Quantifying the Relationships among Drug Classes

    PubMed Central

    Hert, Jérôme; Keiser, Michael J.; Irwin, John J.; Oprea, Tudor I.; Shoichet, Brian K.

    2009-01-01

    The similarity of drug targets is typically measured using sequence or structural information. Here, we consider chemo-centric approaches that measure target similarity on the basis of their ligands, asking how chemoinformatics similarities differ from those derived bioinformatically, how stable the ligand networks are to changes in chemoinformatics metrics, and which network is the most reliable for prediction of pharmacology. We calculated the similarities between hundreds of drug targets and their ligands and mapped the relationship between them in a formal network. Bioinformatics networks were based on the BLAST similarity between sequences, while chemoinformatics networks were based on the ligand-set similarities calculated with either the Similarity Ensemble Approach (SEA) or a method derived from Bayesian statistics. By multiple criteria, bioinformatics and chemoinformatics networks differed substantially, and only occasionally did a high sequence similarity correspond to a high ligand-set similarity. In contrast, the chemoinformatics networks were stable to the method used to calculate the ligand-set similarities and to the chemical representation of the ligands. Also, the chemoinformatics networks were more natural and more organized, by network theory, than their bioinformatics counterparts: ligand-based networks were found to be small-world and broad-scale. PMID:18335977

  11. OpenStructure: a flexible software framework for computational structural biology.

    PubMed

    Biasini, Marco; Mariani, Valerio; Haas, Jürgen; Scheuber, Stefan; Schenk, Andreas D; Schwede, Torsten; Philippsen, Ansgar

    2010-10-15

    Developers of new methods in computational structural biology are often hampered in their research by incompatible software tools and non-standardized data formats. To address this problem, we have developed OpenStructure as a modular open source platform to provide a powerful, yet flexible general working environment for structural bioinformatics. OpenStructure consists primarily of a set of libraries written in C++ with a cleanly designed application programmer interface. All functionality can be accessed directly in C++ or in a Python layer, meeting both the requirements for high efficiency and ease of use. Powerful selection queries and the notion of entity views to represent these selections greatly facilitate the development and implementation of algorithms on structural data. The modular integration of computational core methods with powerful visualization tools makes OpenStructure an ideal working and development environment. Several applications, such as the latest versions of IPLT and QMean, have been implemented based on OpenStructure-demonstrating its value for the development of next-generation structural biology algorithms. Source code licensed under the GNU lesser general public license and binaries for MacOS X, Linux and Windows are available for download at http://www.openstructure.org. torsten.schwede@unibas.ch Supplementary data are available at Bioinformatics online.

  12. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less

  13. DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

    EPA Science Inventory

    The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...

  14. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Bioinformatics of prokaryotic RNAs

    PubMed Central

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  16. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  17. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    PubMed Central

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  18. Metagenomics approach to the study of the gut microbiome structure and function in zebrafish Danio rerio fed with gluten formulated diet.

    PubMed

    Koo, Hyunmin; Hakim, Joseph A; Powell, Mickie L; Kumar, Ranjit; Eipers, Peter G; Morrow, Casey D; Crowley, Michael; Lefkowitz, Elliot J; Watts, Stephen A; Bej, Asim K

    2017-04-01

    In this study, we report the gut microbial composition and predictive functional profiles of zebrafish, Danio rerio, fed with a control formulated diet (CFD), and a gluten formulated diet (GFD) using a metagenomics approach and bioinformatics tools. The microbial communities of the GFD-fed D. rerio displayed heightened abundances of Legionellales, Rhizobiaceae, and Rhodobacter, as compared to the CFD-fed counterparts. Predicted metagenomics of microbial communities (PICRUSt) in GFD-fed D. rerio showed KEGG functional categories corresponding to bile secretion, secondary bile acid biosynthesis, and the metabolism of glycine, serine, and threonine. The CFD-fed D. rerio exhibited KEGG functional categories of bacteria-mediated cobalamin biosynthesis, which was supported by the presence of cobalamin synthesizers such as Bacteroides and Lactobacillus. Though these bacteria were absent in GFD-fed D. rerio, a comparable level of the cobalamin biosynthesis KEGG functional category was observed, which could be contributed by the compensatory enrichment of Cetobacterium. Based on these results, we conclude D. rerio to be a suitable alternative animal model for the use of a targeted metagenomics approach along with bioinformatics tools to further investigate the relationship between the gluten diet and microbiome profile in the gut ecosystem leading to gastrointestinal diseases and other undesired adverse health effects. Copyright © 2017. Published by Elsevier B.V.

  19. Trace Elements and Healthcare: A Bioinformatics Perspective.

    PubMed

    Zhang, Yan

    2017-01-01

    Biological trace elements are essential for human health. Imbalance in trace element metabolism and homeostasis may play an important role in a variety of diseases and disorders. While the majority of previous researches focused on experimental verification of genes involved in trace element metabolism and those encoding trace element-dependent proteins, bioinformatics study on trace elements is relatively rare and still at the starting stage. This chapter offers an overview of recent progress in bioinformatics analyses of trace element utilization, metabolism, and function, especially comparative genomics of several important metals. The relationship between individual elements and several diseases based on recent large-scale systematic studies such as genome-wide association studies and case-control studies is discussed. Lastly, developments of ionomics and its recent application in human health are also introduced.

  20. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  1. Cascaded bidirectional recurrent neural networks for protein secondary structure prediction.

    PubMed

    Chen, Jinmiao; Chaudhari, Narendra

    2007-01-01

    Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.

  2. Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

    PubMed

    Saeed, Faisal; Salim, Naomie; Abdo, Ammar

    2013-07-01

    Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Characterization of the amino acid contribution to the folding degree of proteins.

    PubMed

    Estrada, Ernesto

    2004-03-01

    The folding degree index (Estrada, Bioinformatics 2002;18:697-704) is extended to account for the contribution of amino acids to folding. First, the mathematical formalism for extending the folding degree index is presented. Then, the amino acid contributions to folding degree of several proteins are used to analyze its relation to secondary structure. The possibilities of using these contributions in helping or checking the assignation of secondary structure to amino acids are also introduced. The influence of external factors to the amino acids contribution to folding degree is studied through the temperature effect on ribonuclease A. Finally, the analysis of 3D protein similarity through the use of amino acid contributions to folding degree is studied by selecting a series of lysozymes. These results are compared to that obtained by sequence alignment (2D similarity) and 3D superposition of the structures, showing the uniqueness of the current approach. Copyright 2004 Wiley-Liss, Inc.

  4. Reactome diagram viewer: data structures and strategies to boost performance.

    PubMed

    Fabregat, Antonio; Sidiropoulos, Konstantinos; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-04-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partitioning data structure minimizes CPU workload, enabling the introduction of new features that further enhance user experience. Through the use of highly optimized data structures and algorithms, Reactome has boosted the performance and usability of the new pathway diagram viewer, providing a robust, scalable and easy-to-integrate solution to pathway visualization. As graph-based visualization of complex data is a frequent challenge in bioinformatics, many of the individual strategies presented here are applicable to a wide range of web-based bioinformatics resources. Reactome is available online at: https://reactome.org. The diagram viewer is part of the Reactome pathway browser (https://reactome.org/PathwayBrowser/) and also available as a stand-alone widget at: https://reactome.org/dev/diagram/. The source code is freely available at: https://github.com/reactome-pwp/diagram. fabregat@ebi.ac.uk or hhe@ebi.ac.uk. Supplementary data are available at Bioinformatics online.

  5. Structural bioinformatics of the human spliceosomal proteome

    PubMed Central

    Korneta, Iga; Magnus, Marcin; Bujnicki, Janusz M.

    2012-01-01

    In this work, we describe the results of a comprehensive structural bioinformatics analysis of the spliceosomal proteome. We used fold recognition analysis to complement prior data on the ordered domains of 252 human splicing proteins. Examples of newly identified domains include a PWI domain in the U5 snRNP protein 200K (hBrr2, residues 258–338), while examples of previously known domains with a newly determined fold include the DUF1115 domain of the U4/U6 di-snRNP protein 90K (hPrp3, residues 540–683). We also established a non-redundant set of experimental models of spliceosomal proteins, as well as constructed in silico models for regions without an experimental structure. The combined set of structural models is available for download. Altogether, over 90% of the ordered regions of the spliceosomal proteome can be represented structurally with a high degree of confidence. We analyzed the reduced spliceosomal proteome of the intron-poor organism Giardia lamblia, and as a result, we proposed a candidate set of ordered structural regions necessary for a functional spliceosome. The results of this work will aid experimental and structural analyses of the spliceosomal proteins and complexes, and can serve as a starting point for multiscale modeling of the structure of the entire spliceosome. PMID:22573172

  6. Application of bioinformatics tools and databases in microbial dehalogenation research (a review).

    PubMed

    Satpathy, R; Konkimalla, V B; Ratha, J

    2015-01-01

    Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.

  7. Application of virtual phase-shifting speckle-interferometry for detection of polymorphism in the Chlamydia trachomatis omp1 gene

    NASA Astrophysics Data System (ADS)

    Feodorova, Valentina A.; Saltykov, Yury V.; Zaytsev, Sergey S.; Ulyanov, Sergey S.; Ulianova, Onega V.

    2018-04-01

    Method of phase-shifting speckle-interferometry has been used as a new tool with high potency for modern bioinformatics. Virtual phase-shifting speckle-interferometry has been applied for detection of polymorphism in the of Chlamydia trachomatis omp1 gene. It has been shown, that suggested method is very sensitive to natural genetic mutations as single nucleotide polymorphism (SNP). Effectiveness of proposed method has been compared with effectiveness of the newest bioinformatic tools, based on nucleotide sequence alignment.

  8. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  9. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Golovin, A; Oldfield, T J; Tate, J G; Velankar, S; Barton, G J; Boutselakis, H; Dimitropoulos, D; Fillon, J; Hussain, A; Ionides, J M C; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Pajon, A; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, G J; Tagari, M; Tromm, S; Vranken, W; Henrick, K

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD 'atlas pages' show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www. ebi.ac.uk/msd-srv/docs/roadshow_tutorial/).

  10. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Golovin, A.; Oldfield, T. J.; Tate, J. G.; Velankar, S.; Barton, G. J.; Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Hussain, A.; Ionides, J. M. C.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Pajon, A.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, G. J.; Tagari, M.; Tromm, S.; Vranken, W.; Henrick, K.

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD ‘atlas pages’ show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www.ebi.ac.uk/msd-srv/docs/roadshow_tutorial/). PMID:14681397

  11. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  12. Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

    PubMed

    Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

    2018-06-21

    The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.

  13. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  14. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  15. Web-ware bioinformatical analysis and structure modelling of N-terminus of human multisynthetase complex auxiliary component protein p43.

    PubMed

    Deineko, Viktor

    2006-01-01

    Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.

  16. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  17. PCDDB: new developments at the Protein Circular Dichroism Data Bank.

    PubMed

    Whitmore, Lee; Miles, Andrew John; Mavridis, Lazaros; Janes, Robert W; Wallace, B A

    2017-01-04

    The Protein Circular Dichroism Data Bank (PCDDB) has been in operation for more than 5 years as a public repository for archiving circular dichroism spectroscopic data and associated bioinformatics and experimental metadata. Since its inception, many improvements and new developments have been made in data display, searching algorithms, data formats, data content, auxillary information, and validation techniques, as well as, of course, an increase in the number of holdings. It provides a site (http://pcddb.cryst.bbk.ac.uk) for authors to deposit experimental data as well as detailed information on methods and calculations associated with published work. It also includes links for each entry to bioinformatics databases. The data are freely available to accessors either as single files or as complete data bank downloads. The PCDDB has found broad usage by the structural biology, bioinformatics, analytical and pharmaceutical communities, and has formed the basis for new software and methods developments. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics.

    PubMed

    Zhou, Zhiwei; Xiong, Xin; Zhu, Zheng-Jiang

    2017-07-15

    In metabolomics, rigorous structural identification of metabolites presents a challenge for bioinformatics. The use of collision cross-section (CCS) values of metabolites derived from ion mobility-mass spectrometry effectively increases the confidence of metabolite identification, but this technique suffers from the limit number of available CCS values. Currently, there is no software available for rapidly generating the metabolites' CCS values. Here, we developed the first web server, namely, MetCCS Predictor, for predicting CCS values. It can predict the CCS values of metabolites using molecular descriptors within a few seconds. Common users with limited background on bioinformatics can benefit from this software and effectively improve the metabolite identification in metabolomics. The web server is freely available at: http://www.metabolomics-shanghai.org/MetCCS/ . jiangzhu@sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Bioinformatics by Example: From Sequence to Target

    NASA Astrophysics Data System (ADS)

    Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj

    2002-12-01

    With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.

  20. FY09 Final Report for LDRD Project: Understanding Viral Quasispecies Evolution through Computation and Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, C

    2009-11-12

    In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.

  1. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  2. Promoting synergistic research and education in genomics and bioinformatics.

    PubMed

    Yang, Jack Y; Yang, Mary Qu; Zhu, Mengxia Michelle; Arabnia, Hamid R; Deng, Youping

    2008-01-01

    Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology.High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and scientific achievements by bridging these two very important disciplines into an interactive and attractive forum. Keeping this objective in mind, Biocomp 2007 aims to promote interdisciplinary and multidisciplinary education and research. 25 high quality peer-reviewed papers were selected from 400+ submissions for this supplementary issue of BMC Genomics. Those papers contributed to a wide-range of important research fields including gene expression data analysis and applications, high-throughput genome mapping, sequence analysis, gene regulation, protein structure prediction, disease prediction by machine learning techniques, systems biology, database and biological software development. We always encourage participants submitting proposals for genomics sessions, special interest research sessions, workshops and tutorials to Professor Hamid R. Arabnia (hra@cs.uga.edu) in order to ensure that Biocomp continuously plays the leadership role in promoting inter/multidisciplinary research and education in the fields. Biocomp received top conference ranking with a high score of 0.95/1.00. Biocomp is academically co-sponsored by the International Society of Intelligent Biological Medicine and the Research Laboratories and Centers of Harvard University--Massachusetts Institute of Technology, Indiana University--Purdue University, Georgia Tech--Emory University, UIUC, UCLA, Columbia University, University of Texas at Austin and University of Iowa etc. Biocomp--Worldcomp brings leading scientists together across the nation and all over the world and aims to promote synergistic components such as keynote lectures, special interest sessions, workshops and tutorials in response to the advances of cutting-edge research.

  3. The effects of different representations on static structure analysis of computer malware signatures.

    PubMed

    Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban

    2013-01-01

    The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka.

  4. The Effects of Different Representations on Static Structure Analysis of Computer Malware Signatures

    PubMed Central

    Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban

    2013-01-01

    The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka. PMID:23983644

  5. ArachnoServer 3.0: an online resource for automated discovery, analysis and annotation of spider toxins.

    PubMed

    Pineda, Sandy S; Chaumeil, Pierre-Alain; Kunert, Anne; Kaas, Quentin; Thang, Mike W C; Le, Lien; Nuhn, Michael; Herzig, Volker; Saez, Natalie J; Cristofori-Armstrong, Ben; Anangi, Raveendra; Senff, Sebastian; Gorse, Dominique; King, Glenn F

    2018-03-15

    ArachnoServer is a manually curated database that consolidates information on the sequence, structure, function and pharmacology of spider-venom toxins. Although spider venoms are complex chemical arsenals, the primary constituents are small disulfide-bridged peptides that target neuronal ion channels and receptors. Due to their high potency and selectivity, these peptides have been developed as pharmacological tools, bioinsecticides and drug leads. A new version of ArachnoServer (v3.0) has been developed that includes a bioinformatics pipeline for automated detection and analysis of peptide toxin transcripts in assembled venom-gland transcriptomes. ArachnoServer v3.0 was updated with the latest sequence, structure and functional data, the search-by-mass feature has been enhanced, and toxin cards provide additional information about each mature toxin. http://arachnoserver.org. support@arachnoserver.org. Supplementary data are available at Bioinformatics online.

  6. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives.

    PubMed

    Meinhardt, Sarah; Swint-Kruse, Liskin

    2008-12-01

    In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.

  7. Characterization of the dextran-binding domain in the glucan-binding protein C of Streptococcus mutans.

    PubMed

    Takashima, Y; Fujita, K; Ardin, A C; Nagayama, K; Nomura, R; Nakano, K; Matsumoto-Nakano, M

    2015-10-01

    Streptococcus mutans produces multiple glucan-binding proteins (Gbps), among which GbpC encoded by the gbpC gene is known to be a cell-surface-associated protein involved in dextran-induced aggregation. The purpose of the present study was to characterize the dextran-binding domain of GbpC using bioinformatics analysis and molecular techniques. Bioinformatics analysis specified five possible regions containing molecular binding sites termed GB1 through GB5. Next, truncated recombinant GbpC (rGbpC) encoding each region was produced using a protein expression vector and five deletion mutant strains were generated, termed CDGB1 through CDGB5 respectively. The dextran-binding rates of truncated rGbpC that included the GB1, GB3, GB4 and GB5 regions in the upstream sequences were higher than that of the construct containing GB2 in the downstream region. In addition, the rates of dextran-binding for strains CDGB4 and CD1, which was entire gbpC deletion mutant, were significantly lower than for the other strains, while those of all other deletion mutants were quite similar to that of the parental strain MT8148. Biofilm structures formed by CDGB4 and CD1 were not as pronounced as that of MT8148, while those formed by other strains had greater density as compared to that of CD1. Our results suggest that the dextran-binding domain may be located in the GB4 region in the interior of the gbpC gene. Bioinformatics analysis is useful for determination of functional domains in many bacterial species. © 2015 The Society for Applied Microbiology.

  8. In Silico Analysis for the Study of Botulinum Toxin Structure

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2010-01-01

    Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.

  9. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.

  10. Measuring and comparing structural fluctuation patterns in large protein datasets.

    PubMed

    Fuglebakk, Edvin; Echave, Julián; Reuter, Nathalie

    2012-10-01

    The function of a protein depends not only on its structure but also on its dynamics. This is at the basis of a large body of experimental and theoretical work on protein dynamics. Further insight into the dynamics-function relationship can be gained by studying the evolutionary divergence of protein motions. To investigate this, we need appropriate comparative dynamics methods. The most used dynamical similarity score is the correlation between the root mean square fluctuations (RMSF) of aligned residues. Despite its usefulness, RMSF is in general less evolutionarily conserved than the native structure. A fundamental issue is whether RMSF is not as conserved as structure because dynamics is less conserved or because RMSF is not the best property to use to study its conservation. We performed a systematic assessment of several scores that quantify the (dis)similarity between protein fluctuation patterns. We show that the best scores perform as well as or better than structural dissimilarity, as assessed by their consistency with the SCOP classification. We conclude that to uncover the full extent of the evolutionary conservation of protein fluctuation patterns, it is important to measure the directions of fluctuations and their correlations between sites. Nathalie.Reuter@mbi.uib.no Supplementary data are available at Bioinformatics Online.

  11. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades.

    PubMed

    Wren, Jonathan D

    2016-09-01

    To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  13. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    PubMed

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  14. Biological and functional relevance of CASP predictions.

    PubMed

    Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B

    2018-03-01

    Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.

  15. A comparison of common programming languages used in bioinformatics.

    PubMed

    Fourment, Mathieu; Gillings, Michael R

    2008-02-05

    The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

  16. Better bioinformatics through usability analysis.

    PubMed

    Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia

    2009-02-01

    Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.

  17. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  18. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. X-ray crystallography over the past decade for novel drug discovery - where are we heading next?

    PubMed

    Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek

    2015-01-01

    Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible.

  20. Improved, ACMG-Compliant, in silico prediction of pathogenicity for missense substitutions encoded by TP53 variants.

    PubMed

    Fortuno, Cristina; James, Paul A; Young, Erin L; Feng, Bing; Olivier, Magali; Pesaran, Tina; Tavtigian, Sean V; Spurdle, Amanda B

    2018-05-18

    Clinical interpretation of germline missense variants represents a major challenge, including those in the TP53 Li-Fraumeni syndrome gene. Bioinformatic prediction is a key part of variant classification strategies. We aimed to optimize the performance of the Align-GVGD tool used for p53 missense variant prediction, and compare its performance to other bioinformatic tools (SIFT, PolyPhen-2) and ensemble methods (REVEL, BayesDel). Reference sets of assumed pathogenic and assumed benign variants were defined using functional and/or clinical data. Area under the curve and Matthews correlation coefficient (MCC) values were used as objective functions to select an optimized protein multi-sequence alignment with best performance for Align-GVGD. MCC comparison of tools using binary categories showed optimized Align-GVGD (C15 cut-off) combined with BayesDel (0.16 cut-off), or with REVEL (0.5 cut-off), to have the best overall performance. Further, a semi-quantitative approach using multiple tiers of bioinformatic prediction, validated using an independent set of non-functional and functional variants, supported use of Align-GVGD and BayesDel prediction for different strength of evidence levels in ACMG/AMP rules. We provide rationale for bioinformatic tool selection for TP53 variant classification, and have also computed relevant bioinformatic predictions for every possible p53 missense variant to facilitate their use by the scientific and medical community. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  1. A review of bioinformatics training applied to research in molecular medicine, agriculture and biodiversity in Costa Rica and Central America.

    PubMed

    Orozco, Allan; Morera, Jessica; Jiménez, Sergio; Boza, Ricardo

    2013-09-01

    Today, Bioinformatics has become a scientific discipline with great relevance for the Molecular Biosciences and for the Omics sciences in general. Although developed countries have progressed with large strides in Bioinformatics education and research, in other regions, such as Central America, the advances have occurred in a gradual way and with little support from the Academia, either at the undergraduate or graduate level. To address this problem, the University of Costa Rica's Medical School, a regional leader in Bioinformatics in Central America, has been conducting a series of Bioinformatics workshops, seminars and courses, leading to the creation of the region's first Bioinformatics Master's Degree. The recent creation of the Central American Bioinformatics Network (BioCANET), associated to the deployment of a supporting computational infrastructure (HPC Cluster) devoted to provide computing support for Molecular Biology in the region, is providing a foundational stone for the development of Bioinformatics in the area. Central American bioinformaticians have participated in the creation of as well as co-founded the Iberoamerican Bioinformatics Society (SOIBIO). In this article, we review the most recent activities in education and research in Bioinformatics from several regional institutions. These activities have resulted in further advances for Molecular Medicine, Agriculture and Biodiversity research in Costa Rica and the rest of the Central American countries. Finally, we provide summary information on the first Central America Bioinformatics International Congress, as well as the creation of the first Bioinformatics company (Indromics Bioinformatics), spin-off the Academy in Central America and the Caribbean.

  2. Bioinformatics and systems biology research update from the 15th International Conference on Bioinformatics (InCoB2016).

    PubMed

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  3. Rapid phylogenetic dissection of prokaryotic community structure in tidal flat using pyrosequencing.

    PubMed

    Kim, Bong-Soo; Kim, Byung Kwon; Lee, Jae-Hak; Kim, Myungjin; Lim, Young Woon; Chun, Jongsik

    2008-08-01

    Dissection of prokaryotic community structure is prerequisite to understand their ecological roles. Various methods are available for such a purpose which amplification and sequencing of 16S rRNA genes gained its popularity. However, conventional methods based on Sanger sequencing technique require cloning process prior to sequencing, and are expensive and labor-intensive. We investigated prokaryotic community structure in tidal flat sediments, Korea, using pyrosequencing and a subsequent automated bioinformatic pipeline for the rapid and accurate taxonomic assignment of each amplicon. The combination of pyrosequencing and bioinformatic analysis showed that bacterial and archaeal communities were more diverse than previously reported in clone library studies. Pyrosequencing analysis revealed 21 bacterial divisions and 37 candidate divisions. Proteobacteria was the most abundant division in the bacterial community, of which Gamma-and Delta-Proteobacteria were the most abundant. Similarly, 4 archaeal divisions were found in tidal flat sediments. Euryarchaeota was the most abundant division in the archaeal sequences, which were further divided into 8 classes and 11 unclassified euryarchaeota groups. The system developed here provides a simple, in-depth and automated way of dissecting a prokaryotic community structure without extensive pretreatment such as cloning.

  4. KISS for STRAP: user extensions for a protein alignment editor.

    PubMed

    Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius

    2003-12-12

    The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/

  5. Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens

    PubMed Central

    Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo

    2018-01-01

    Abstract Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. PMID:29106651

  6. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  7. MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers.

    PubMed

    Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka

    2014-11-15

    The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  8. Biopython: freely available Python tools for computational molecular biology and bioinformatics

    PubMed Central

    Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

    2009-01-01

    Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878

  9. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  10. Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.

    PubMed

    Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying

    2017-01-01

    Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. BioTextQuest: a web-based biomedical text mining suite for concept discovery.

    PubMed

    Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J

    2011-12-01

    BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.

  12. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

  13. From cheminformatics to structure-based design: Web services and desktop applications based on the NAOMI library.

    PubMed

    Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias

    2017-11-10

    Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  15. PDBe: Protein Data Bank in Europe

    PubMed Central

    Velankar, Sameer; Alhroub, Younes; Alili, Anaëlle; Best, Christoph; Boutselakis, Harry C.; Caboche, Ségolène; Conroy, Matthew J.; Dana, Jose M.; van Ginkel, Glen; Golovin, Adel; Gore, Swanand P.; Gutmanas, Aleksandras; Haslam, Pauline; Hirshberg, Miriam; John, Melford; Lagerstedt, Ingvar; Mir, Saqib; Newman, Laurence E.; Oldfield, Tom J.; Penkett, Chris J.; Pineda-Castillo, Jorge; Rinaldi, Luana; Sahni, Gaurav; Sawka, Grégoire; Sen, Sanchayita; Slowley, Robert; Sousa da Silva, Alan Wilter; Suarez-Uruena, Antonio; Swaminathan, G. Jawahar; Symmons, Martyn F.; Vranken, Wim F.; Wainwright, Michael; Kleywegt, Gerard J.

    2011-01-01

    The Protein Data Bank in Europe (PDBe; pdbe.org) is actively involved in managing the international archive of biomacromolecular structure data as one of the partners in the Worldwide Protein Data Bank (wwPDB; wwpdb.org). PDBe also develops new tools to make structural data more widely and more easily available to the biomedical community. PDBe has developed a browser to access and analyze the structural archive using classification systems that are familiar to chemists and biologists. The PDBe web pages that describe individual PDB entries have been enhanced through the introduction of plain-English summary pages and iconic representations of the contents of an entry (PDBprints). In addition, the information available for structures determined by means of NMR spectroscopy has been expanded. Finally, the entire web site has been redesigned to make it substantially easier to use for expert and novice users alike. PDBe works closely with other teams at the European Bioinformatics Institute (EBI) and in the international scientific community to develop new resources with value-added information. The SIFTS initiative is an example of such a collaboration—it provides extensive mapping data between proteins whose structures are available from the PDB and a host of other biomedical databases. SIFTS is widely used by major bioinformatics resources. PMID:21045060

  16. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation.

    PubMed

    McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick

    2007-01-01

    The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.

  17. Metabolomics, Standards, and Metabolic Modeling for Synthetic Biology in Plants

    PubMed Central

    Hill, Camilla Beate; Czauderna, Tobias; Klapperstück, Matthias; Roessner, Ute; Schreiber, Falk

    2015-01-01

    Life on earth depends on dynamic chemical transformations that enable cellular functions, including electron transfer reactions, as well as synthesis and degradation of biomolecules. Biochemical reactions are coordinated in metabolic pathways that interact in a complex way to allow adequate regulation. Biotechnology, food, biofuel, agricultural, and pharmaceutical industries are highly interested in metabolic engineering as an enabling technology of synthetic biology to exploit cells for the controlled production of metabolites of interest. These approaches have only recently been extended to plants due to their greater metabolic complexity (such as primary and secondary metabolism) and highly compartmentalized cellular structures and functions (including plant-specific organelles) compared with bacteria and other microorganisms. Technological advances in analytical instrumentation in combination with advances in data analysis and modeling have opened up new approaches to engineer plant metabolic pathways and allow the impact of modifications to be predicted more accurately. In this article, we review challenges in the integration and analysis of large-scale metabolic data, present an overview of current bioinformatics methods for the modeling and visualization of metabolic networks, and discuss approaches for interfacing bioinformatics approaches with metabolic models of cellular processes and flux distributions in order to predict phenotypes derived from specific genetic modifications or subjected to different environmental conditions. PMID:26557642

  18. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.

    PubMed

    Jung, Sang-Kyu; McDonald, Karen

    2011-08-16

    Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.

  19. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

    PubMed Central

    2011-01-01

    Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353

  20. PATRIC, the bacterial bioinformatics database and analysis resource.

    PubMed

    Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

  1. PATRIC, the bacterial bioinformatics database and analysis resource

    PubMed Central

    Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323

  2. Correspondence regarding Zhong et al., BMC Bioinformatics 2013 Mar 7;14:89.

    PubMed

    Kuhn, Alexandre

    2014-11-28

    Computational expression deconvolution aims to estimate the contribution of individual cell populations to expression profiles measured in samples of heterogeneous composition. Zhong et al. recently proposed Digital Sorting Algorithm (BMC Bioinformatics 2013 Mar 7;14:89) and showed that they could accurately estimate population-specific expression levels and expression differences between two populations. They compared DSA with Population-Specific Expression Analysis (PSEA), a previous deconvolution method that we developed to detect expression changes occurring within the same population between two conditions (e.g. disease versus non-disease). However, Zhong et al. compared PSEA-derived specific expression levels across different cell populations. Specific expression levels obtained with PSEA cannot be directly compared across different populations as they are on a relative scale. They are accurate as we demonstrate by deconvolving the same dataset used by Zhong et al. and, importantly, allow for comparison of population-specific expression across conditions.

  3. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program

    PubMed Central

    Rein, Diane C.

    2006-01-01

    Setting: Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. Program Components: After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. Evaluation Mechanisms: The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Next Steps/Future Directions: Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making. PMID:16888666

  4. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program.

    PubMed

    Rein, Diane C

    2006-07-01

    Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making.

  5. DSSR-enhanced visualization of nucleic acid structures in Jmol

    PubMed Central

    Hanson, Robert M.

    2017-01-01

    Abstract Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). PMID:28472503

  6. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  7. Functional artificial luciferases as an optical readout for bioassays.

    PubMed

    Kim, Sung Bae; Izumi, Hiroshi

    2014-06-13

    This study elucidates functional artificial luciferases (ALucs) wholly synthesized for bioassays and molecular imaging. The ALucs bearing epitopes were newly created by amending the sequences of our previously reported ALucs in light of a multi-sequence alignment and hydrophobicity search. The synthesized ALucs are survived in live cells and stable in culture media for 25 days after secretion. The epitopes in ALucs are exposed during the secretion process and indeed valid for column purification and immunological assays. The ALucs exerted a 9400-times stronger optical intensity with a coelenterazine derivative (CTZ i), when compared with Renilla reniformis luciferase 8.6-535. A supersecondary structure of ALuc30 was predicted with respect to the X-ray crystallographic information of the coelenterazine-binding protein (CBP). The structure revealed that ALuc30 has a room for accommodating the iodide of CTZ i. This study guides on how to create functional artificial luciferases and predicts the structural details with the current bioinformatics technologies. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  9. Bioinformatic perspectives on NRPS/PKS megasynthases: advances and challenges.

    PubMed

    Jenke-Kodama, Holger; Dittmann, Elke

    2009-07-01

    The increased understanding of both fundamental principles and mechanistic variations of NRPS/PKS megasynthases along with the unprecedented availability of microbial sequences has inspired a number of in silico studies of both enzyme families. The insights that can be extracted from these analyses go far beyond a rough classification of data and have turned bioinformatics into a frontier field of natural products research. As databases are flooded with NRPS/PKS gene sequence of microbial genomes and metagenomes, increasingly reliable structural prediction methods can help to uncover hidden treasures. Already, phylogenetic analyses have revealed that NRPS/PKS pathways should not simply be regarded as enzyme complexes, specifically evolved to product a selected natural product. Rather, they represent a collection of genetic opinions, allowing biosynthetic pathways to be shuffled in a process of perpetual chemical innovations and pathways diversification in nature can give impulses for specificities, protein interactions and genetic engineering of libraries of novel peptides and polyketides. The successful translation of the knowledge obtained from bioinformatic dissection of NRPS/PKS megasynthases into new techniques for drug discovery and design remain challenges for the future.

  10. OralCard: a bioinformatic tool for the study of oral proteome.

    PubMed

    Arrais, Joel P; Rosa, Nuno; Melo, José; Coelho, Edgar D; Amaral, Diana; Correia, Maria José; Barros, Marlene; Oliveira, José Luís

    2013-07-01

    The molecular complexity of the human oral cavity can only be clarified through identification of components that participate within it. However current proteomic techniques produce high volumes of information that are dispersed over several online databases. Collecting all of this data and using an integrative approach capable of identifying unknown associations is still an unsolved problem. This is the main motivation for this work. We present the online bioinformatic tool OralCard, which comprises results from 55 manually curated articles reflecting the oral molecular ecosystem (OralPhysiOme). It comprises experimental information available from the oral proteome both of human (OralOme) and microbial origin (MicroOralOme) structured in protein, disease and organism. This tool is a key resource for researchers to understand the molecular foundations implicated in biology and disease mechanisms of the oral cavity. The usefulness of this tool is illustrated with the analysis of the oral proteome associated with diabetes melitus type 2. OralCard is available at http://bioinformatics.ua.pt/oralcard. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. blastjs: a BLAST+ wrapper for Node.js.

    PubMed

    Page, Martin; MacLean, Dan; Schudoma, Christian

    2016-02-27

    To cope with the ever-increasing amount of sequence data generated in the field of genomics, the demand for efficient and fast database searches that drive functional and structural annotation in both large- and small-scale genome projects is on the rise. The tools of the BLAST+ suite are the most widely employed bioinformatic method for these database searches. Recent trends in bioinformatics application development show an increasing number of JavaScript apps that are based on modern frameworks such as Node.js. Until now, there is no way of using database searches with the BLAST+ suite from a Node.js codebase. We developed blastjs, a Node.js library that wraps the search tools of the BLAST+ suite and thus allows to easily add significant functionality to any Node.js-based application. blastjs is a library that allows the incorporation of BLAST+ functionality into bioinformatics applications based on JavaScript and Node.js. The library was designed to be as user-friendly as possible and therefore requires only a minimal amount of code in the client application. The library is freely available under the MIT license at https://github.com/teammaclean/blastjs.

  12. Significant Natural Product Biosynthetic Potential of Actinorhizal Symbionts of the Genus Frankia, as Revealed by Comparative Genomic and Proteomic Analyses▿

    PubMed Central

    Udwary, Daniel W.; Gontang, Erin A.; Jones, Adam C.; Jones, Carla S.; Schultz, Andrew W.; Winter, Jaclyn M.; Yang, Jane Y.; Beauchemin, Nicholas; Capson, Todd L.; Clark, Benjamin R.; Esquenazi, Eduardo; Eustáquio, Alessandra S.; Freel, Kelle; Gerwick, Lena; Gerwick, William H.; Gonzalez, David; Liu, Wei-Ting; Malloy, Karla L.; Maloney, Katherine N.; Nett, Markus; Nunnery, Joshawna K.; Penn, Kevin; Prieto-Davo, Alejandra; Simmons, Thomas L.; Weitz, Sara; Wilson, Micheal C.; Tisa, Louis S.; Dorrestein, Pieter C.; Moore, Bradley S.

    2011-01-01

    Bacteria of the genus Frankia are mycelium-forming actinomycetes that are found as nitrogen-fixing facultative symbionts of actinorhizal plants. Although soil-dwelling actinomycetes are well-known producers of bioactive compounds, the genus Frankia has largely gone uninvestigated for this potential. Bioinformatic analysis of the genome sequences of Frankia strains ACN14a, CcI3, and EAN1pec revealed an unexpected number of secondary metabolic biosynthesis gene clusters. Our analysis led to the identification of at least 65 biosynthetic gene clusters, the vast majority of which appear to be unique and for which products have not been observed or characterized. More than 25 secondary metabolite structures or structure fragments were predicted, and these are expected to include cyclic peptides, siderophores, pigments, signaling molecules, and specialized lipids. Outside the hopanoid gene locus, no cluster could be convincingly demonstrated to be responsible for the few secondary metabolites previously isolated from other Frankia strains. Few clusters were shared among the three species, demonstrating species-specific biosynthetic diversity. Proteomic analysis of Frankia sp. strains CcI3 and EAN1pec showed that significant and diverse secondary metabolic activity was expressed in laboratory cultures. In addition, several prominent signals in the mass range of peptide natural products were observed in Frankia sp. CcI3 by intact-cell matrix-assisted laser desorption-ionization mass spectrometry (MALDI-MS). This work supports the value of bioinformatic investigation in natural products biosynthesis using genomic information and presents a clear roadmap for natural products discovery in the Frankia genus. PMID:21498757

  13. ADVANCED COMPUTATIONAL METHODS IN DOSE MODELING: APPLICATION OF COMPUTATIONAL BIOPHYSICAL TRANSPORT, COMPUTATIONAL CHEMISTRY, AND COMPUTATIONAL BIOLOGY

    EPA Science Inventory

    Computational toxicology (CompTox) leverages the significant gains in computing power and computational techniques (e.g., numerical approaches, structure-activity relationships, bioinformatics) realized over the last few years, thereby reducing costs and increasing efficiency i...

  14. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.

    PubMed

    Wightman, Bruce; Hark, Amy T

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.

  15. Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

    PubMed

    Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

    2017-11-10

    RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  17. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  18. CORAL: aligning conserved core regions across domain families.

    PubMed

    Fong, Jessica H; Marchler-Bauer, Aron

    2009-08-01

    Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Supplementary data are available at Bioinformatics online.

  19. Cloning, characterisation and comparative analysis of a starch synthase IV gene in wheat: functional and evolutionary implications

    PubMed Central

    Leterrier, Marina; Holappa, Lynn D; Broglie, Karen E; Beckles, Diane M

    2008-01-01

    Background Starch is of great importance to humans as a food and biomaterial, and the amount and structure of starch made in plants is determined in part by starch synthase (SS) activity. Five SS isoforms, SSI, II, III, IV and Granule Bound SSI, have been identified, each with a unique catalytic role in starch synthesis. The basic mode of action of SSs is known; however our knowledge of several aspects of SS enzymology at the structural and mechanistic level is incomplete. To gain a better understanding of the differences in SS sequences that underscore their specificity, the previously uncharacterised SSIVb from wheat was cloned and extensive bioinformatics analyses of this and other SSs sequences were done. Results The wheat SSIV cDNA is most similar to rice SSIVb with which it shows synteny and shares a similar exon-intron arrangement. The wheat SSIVb gene was preferentially expressed in leaf and was not regulated by a circadian clock. Phylogenetic analysis showed that in plants, SSIV is closely related to SSIII, while SSI, SSII and Granule Bound SSI clustered together and distinctions between the two groups can be made at the genetic level and included chromosomal location and intron conservation. Further, identified differences at the amino acid level in their glycosyltransferase domains, predicted secondary structures, global conformations and conserved residues might be indicative of intragroup functional associations. Conclusion Based on bioinformatics analysis of the catalytic region of 36 SSs and 3 glycogen synthases (GSs), it is suggested that the valine residue in the highly conserved K-X-G-G-L motif in SSIII and SSIV may be a determining feature of primer specificity of these SSs as compared to GBSSI, SSI and SSII. In GBSSI, the Ile485 residue may partially explain that enzyme's unique catalytic features. The flexible 380s Loop in the starch catalytic domain may be important in defining the specificity of action for each different SS and the G-X-G in motif VI could define SSIV and SSIII action particularly. PMID:18826586

  20. Quality assessment of protein model-structures using evolutionary conservation.

    PubMed

    Kalman, Matan; Ben-Tal, Nir

    2010-05-15

    Programs that evaluate the quality of a protein structural model are important both for validating the structure determination procedure and for guiding the model-building process. Such programs are based on properties of native structures that are generally not expected for faulty models. One such property, which is rarely used for automatic structure quality assessment, is the tendency for conserved residues to be located at the structural core and for variable residues to be located at the surface. We present ConQuass, a novel quality assessment program based on the consistency between the model structure and the protein's conservation pattern. We show that it can identify problematic structural models, and that the scores it assigns to the server models in CASP8 correlate with the similarity of the models to the native structure. We also show that when the conservation information is reliable, the method's performance is comparable and complementary to that of the other single-structure quality assessment methods that participated in CASP8 and that do not use additional structural information from homologs. A perl implementation of the method, as well as the various perl and R scripts used for the analysis are available at http://bental.tau.ac.il/ConQuass/. nirb@tauex.tau.ac.il Supplementary data are available at Bioinformatics online.

  1. An Inquiry into Protein Structure and Genetic Disease: Introducing Undergraduates to Bioinformatics in a Large Introductory Course

    ERIC Educational Resources Information Center

    Bednarski, April E.; Elgin, Sarah C. R.; Pakrasi, Himadri B.

    2005-01-01

    This inquiry-based lab is designed around genetic diseases with a focus on protein structure and function. To allow students to work on their own investigatory projects, 10 projects on 10 different proteins were developed. Students are grouped in sections of 20 and work in pairs on each of the projects. To begin their investigation, students are…

  2. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases

    PubMed Central

    Uversky, Vladimir N; Oldfield, Christopher J; Midic, Uros; Xie, Hongbo; Xue, Bin; Vucetic, Slobodan; Iakoucheva, Lilia M; Obradovic, Zoran; Dunker, A Keith

    2009-01-01

    Background Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) lack stable tertiary and/or secondary structure yet fulfills key biological functions. The recent recognition of IDPs and IDRs is leading to an entire field aimed at their systematic structural characterization and at determination of their mechanisms of action. Bioinformatics studies showed that IDPs and IDRs are highly abundant in different proteomes and carry out mostly regulatory functions related to molecular recognition and signal transduction. These activities complement the functions of structured proteins. IDPs and IDRs were shown to participate in both one-to-many and many-to-one signaling. Alternative splicing and posttranslational modifications are frequently used to tune the IDP functionality. Several individual IDPs were shown to be associated with human diseases, such as cancer, cardiovascular disease, amyloidoses, diabetes, neurodegenerative diseases, and others. This raises questions regarding the involvement of IDPs and IDRs in various diseases. Results IDPs and IDRs were shown to be highly abundant in proteins associated with various human maladies. As the number of IDPs related to various diseases was found to be very large, the concepts of the disease-related unfoldome and unfoldomics were introduced. Novel bioinformatics tools were proposed to populate and characterize the disease-associated unfoldome. Structural characterization of the members of the disease-related unfoldome requires specialized experimental approaches. IDPs possess a number of unique structural and functional features that determine their broad involvement into the pathogenesis of various diseases. Conclusion Proteins associated with various human diseases are enriched in intrinsic disorder. These disease-associated IDPs and IDRs are real, abundant, diversified, vital, and dynamic. These proteins and regions comprise the disease-related unfoldome, which covers a significant part of the human proteome. Profound association between intrinsic disorder and various human diseases is determined by a set of unique structural and functional characteristics of IDPs and IDRs. Unfoldomics of human diseases utilizes unrivaled bioinformatics and experimental techniques, paves the road for better understanding of human diseases, their pathogenesis and molecular mechanisms, and helps develop new strategies for the analysis of disease-related proteins. PMID:19594884

  3. MultiSeq: unifying sequence and structure data for evolutionary analysis

    PubMed Central

    Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

    2006-01-01

    Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: PMID:16914055

  4. Improved genomic resources and new bioinformatic workflow for the carcinogenic parasite Clonorchis sinensis: Biotechnological implications.

    PubMed

    Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D

    Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192

  6. Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

    PubMed

    Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

  7. Structure prediction, expression, and antigenicity of c-terminal of GRP78.

    PubMed

    Aghamollaei, Hossein; Mousavi Gargari, Seyed Latif; Ghanei, Mostafa; Rasaee, Mohamad Javad; Amani, Jafar; Bakherad, Hamid; Farnoosh, Gholamreza

    2017-01-01

    Glucose-regulated protein 78 (GRP78) is a typical endoplasmic reticulum luminal chaperone having a main role in the activation of the unfolded protein response. Because of hypoxia and nutrient deprivation in the tumor microenvironment, expression of GRP78 in these cells becomes higher than the native cells, which makes it a suitable candidate for cancer targeting. Suppression of survival signals by antibody production against C-terminal domain of GR78 (CGRP) can induce apoptosis of cancer cells. The aim of this study was in silico analysis, recombinant production, and characterization of CGRP in Escherichia coli. Structural prediction of CGRP by bioinformatics tools was done and the construct containing optimized sequence was transferred to E. coli T7 shuffle. Expression was induced by isopropyl-β-d-thiogalactoside, and recombinant protein was purified by Ni-NTA agarose resin. The content of secondary structures was obtained by circular dichroism (CD) spectrum. CGRP immunogenicity was evaluated from the immunized mouse sera. SDS-PAGE analysis showed CGRP expression in E. coli. CD spectrum also confirmed prediction of structures by bioinformatics tools. The enzyme-linked immunosorbent assay using sera from immunized mice revealed CGRP as a good immunogen. The results obtained in this study showed that the structure of truncated CGRP is very similar to its structure in the whole protein context. This protein can be used in cancer researches. © 2015 International Union of Biochemistry and Molecular Biology, Inc.

  8. Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

    PubMed Central

    Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

  9. Bioinformatics study of the mangrove actin genes

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wasilah, M.; Sumardi

    2017-01-01

    This study describes the bioinformatics methods to analyze eight actin genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, subcellular localization, similarity, and phylogenetic. The physical and chemical properties of eight mangroves showed variation among the genes. The percentage of the secondary structure of eight mangrove actin genes followed the order of a helix > random coil > extended chain structure for BgActl, KcActl, RsActl, and A. corniculatum Act. In contrast to this observation, the remaining actin genes were random coil > extended chain structure > a helix. This study, therefore, shown the prediction of secondary structure was performed for necessary structural information. The values of chloroplast or signal peptide or mitochondrial target were too small, indicated that no chloroplast or mitochondrial transit peptide or signal peptide of secretion pathway in mangrove actin genes. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove actin genes. To clarify the relationship among the mangrove actin gene, a phylogenetic tree was constructed. Three groups of mangrove actin genes were formed, the first group contains B. gymnorrhiza BgAct and R. stylosa RsActl. The second cluster which consists of 5 actin genes the largest group, and the last branch consist of one gene, B. sexagula Act. The present study, therefore, supported the previous results that plant actin genes form distinct clusters in the tree.

  10. Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

    PubMed Central

    Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

    2017-01-01

    Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161

  11. X-ray crystallography over the past decade for novel drug discovery – where are we heading next?

    PubMed Central

    Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek

    2015-01-01

    Introduction Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. Areas covered This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. Expert opinion X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible. PMID:26177814

  12. Insights into Antimicrobial Peptides from Spiders and Scorpions

    PubMed Central

    Wang, Xiuqing; Wang, Guangshun

    2015-01-01

    The venoms of spiders and scorpions contain a variety of chemical compounds. Antimicrobial peptides (AMPs) from these organisms were first discovered in the 1990s. As of May 2015, there were 42 spider’s and 63 scorpion’s AMPs in the Antimicrobial Peptide Database (http://aps.unmc.edu/AP). These peptides have demonstrated broad or narrow-spectrum activities against bacteria, fungi, viruses, and parasites. In addition, they can be toxic to cancer cells, insects and erythrocytes. To provide insight into such an activity spectrum, this article discusses the discovery, classification, structure and activity relationships, bioinformatics analysis, and potential applications of spider and scorpion AMPs. Our analysis reveals that, in the case of linear peptides, spiders use both glycine-rich and helical peptide models for defense, whereas scorpions use two distinct helical peptide models with different amino acid compositions to exert the observed antimicrobial activities and hemolytic toxicity. Our structural bioinformatics study improves the knowledge in the field and can be used to design more selective peptides to combat tumors, parasites, and viruses. PMID:27165405

  13. Accelerating large-scale protein structure alignments with graphics processing units

    PubMed Central

    2012-01-01

    Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132

  14. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

    PubMed

    Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-03-01

    Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers

    PubMed Central

    Brazas, Michelle D.; Ouellette, B. F. Francis

    2016-01-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression. PMID:27281025

  16. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  17. Bioinformatics research in the Asia Pacific: a 2007 update.

    PubMed

    Ranganathan, Shoba; Gribskov, Michael; Tan, Tin Wee

    2008-01-01

    We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27-30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.

  18. Meeting review: Bioinformatics and Medicine - from molecules to humans, virtual and real.

    PubMed

    Russell, Roslin

    2002-01-01

    The Industrialization Workshop Series aims to promote and discuss integration, automation, simulation, quality, availability and standards in the high-throughput life sciences. The main issues addressed being the transformation of bioinformatics and bioinformaticsbased drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, protein structure research, protein-protein interactions, metabolism and physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien, Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward.

  19. The European Bioinformatics Institute's data resources 2014.

    PubMed

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.

  20. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  1. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    PubMed

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  2. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    PubMed Central

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

  3. The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics.

    PubMed

    de Matos, Paula; Cham, Jennifer A; Cao, Hong; Alcántara, Rafael; Rowland, Francis; Lopez, Rodrigo; Steinbeck, Christoph

    2013-03-20

    User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users' requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios.For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature.We employed several UCD techniques, including: persona development, interviews, 'canvas sort' card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience.

  4. Emerging strengths in Asia Pacific bioinformatics.

    PubMed

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-12-12

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20-23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.

  5. Emerging strengths in Asia Pacific bioinformatics

    PubMed Central

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-01-01

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008

  6. Protein Folding and Structure Prediction from the Ground Up: The Atomistic Associative Memory, Water Mediated, Structure and Energy Model.

    PubMed

    Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G

    2016-08-25

    The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.

  7. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  8. Extending Asia Pacific bioinformatics into new realms in the "-omics" era.

    PubMed

    Ranganathan, Shoba; Eisenhaber, Frank; Tong, Joo Chuan; Tan, Tin Wee

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.

  9. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  10. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase.

    PubMed

    Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong

    2005-01-01

    Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.

  11. PONDEROSA-C/S: client-server based software package for automated protein 3D structure determination.

    PubMed

    Lee, Woonghee; Stark, Jaime L; Markley, John L

    2014-11-01

    Peak-picking Of Noe Data Enabled by Restriction Of Shift Assignments-Client Server (PONDEROSA-C/S) builds on the original PONDEROSA software (Lee et al. in Bioinformatics 27:1727-1728. doi: 10.1093/bioinformatics/btr200, 2011) and includes improved features for structure calculation and refinement. PONDEROSA-C/S consists of three programs: Ponderosa Server, Ponderosa Client, and Ponderosa Analyzer. PONDEROSA-C/S takes as input the protein sequence, a list of assigned chemical shifts, and nuclear Overhauser data sets ((13)C- and/or (15)N-NOESY). The output is a set of assigned NOEs and 3D structural models for the protein. Ponderosa Analyzer supports the visualization, validation, and refinement of the results from Ponderosa Server. These tools enable semi-automated NMR-based structure determination of proteins in a rapid and robust fashion. We present examples showing the use of PONDEROSA-C/S in solving structures of four proteins: two that enable comparison with the original PONDEROSA package, and two from the Critical Assessment of automated Structure Determination by NMR (Rosato et al. in Nat Methods 6:625-626. doi: 10.1038/nmeth0909-625 , 2009) competition. The software package can be downloaded freely in binary format from http://pine.nmrfam.wisc.edu/download_packages.html. Registered users of the National Magnetic Resonance Facility at Madison can submit jobs to the PONDEROSA-C/S server at http://ponderosa.nmrfam.wisc.edu, where instructions, tutorials, and instructions can be found. Structures are normally returned within 1-2 days.

  12. Structural bioinformatics: methods, concepts and applications to blood coagulation proteins.

    PubMed

    Villoutreix, Bruno O

    2002-06-01

    Structural and theoretical analyses of proteins are central to the understanding of complex molecular mechanisms and are fundamental to the drug discovery process. Computational techniques yield useful insights into an ever-wider range of biomolecular systems. Protein three-dimensional structures and molecular functions can be predicted in some circumstances, while experimental structures can be analyzed in depth via such computational approaches. Non-covalent binding of biomolecules can be understood by considering structural, thermodynamic and kinetic issues, and theoretical simulations of such events can be attempted. The central role of electrostatic interactions with regard to protein function, structure and stability has been investigated and some electrostatic properties can be modeled theoretically. Computer methods thus help to prioritize, design, analyze and rationalize biochemical experiments. Cardiovascular diseases and associated blood coagulation disorders are leading causes of death worldwide. Blood coagulation involves more than 30 proteins that interact specifically with various degrees of affinity. Many of these molecules can also bind transiently to phospholipid surfaces. Numerous point mutations in the genes of coagulation proteins and regulators have been identified. Understanding the coagulation cascade, its regulation and the impact of mutations is required for the development of new therapies and diagnostic tools. In this review, we describe concepts and methods pertaining to the field of structural bioinformatics. We provide examples of applications of these approaches to blood coagulation proteins and show that such studies can give insights about molecular mechanisms contributing to cardiovascular disease susceptibility.

  13. CSB: a Python framework for structural bioinformatics.

    PubMed

    Kalev, Ivan; Mechelke, Martin; Kopec, Klaus O; Holder, Thomas; Carstens, Simeon; Habeck, Michael

    2012-11-15

    Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project's website http://csb.codeplex.com. ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de

  14. Identifying intrinsically disordered protein regions likely to undergo binding-induced helical transitions.

    PubMed

    Glover, Karen; Mei, Yang; Sinha, Sangita C

    2016-10-01

    Many proteins contain intrinsically disordered regions (IDRs) lacking stable secondary and ordered tertiary structure. IDRs are often implicated in macromolecular interactions, and may undergo structural transitions upon binding to interaction partners. However, as binding partners of many protein IDRs are unknown, these structural transitions are difficult to verify and often are poorly understood. In this study we describe a method to identify IDRs that are likely to undergo helical transitions upon binding. This method combines bioinformatics analyses followed by circular dichroism spectroscopy to monitor 2,2,2-trifluoroethanol (TFE)-induced changes in secondary structure content of these IDRs. Our results demonstrate that there is no significant change in the helicity of IDRs that are not predicted to fold upon binding. IDRs that are predicted to fold fall into two groups: one group does not become helical in the presence of TFE and includes examples of IDRs that form β-strands upon binding, while the other group becomes more helical and includes examples that are known to fold into helices upon binding. Therefore, we propose that bioinformatics analyses combined with experimental evaluation using TFE may provide a general method to identify IDRs that undergo binding-induced disorder-to-helix transitions. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Ensembl 2002: accommodating comparative genomics.

    PubMed

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  16. Molecular dynamics simulations: advances and applications

    PubMed Central

    Hospital, Adam; Goñi, Josep Ramon; Orozco, Modesto; Gelpí, Josep L

    2015-01-01

    Molecular dynamics simulations have evolved into a mature technique that can be used effectively to understand macromolecular structure-to-function relationships. Present simulation times are close to biologically relevant ones. Information gathered about the dynamic properties of macromolecules is rich enough to shift the usual paradigm of structural bioinformatics from studying single structures to analyze conformational ensembles. Here, we describe the foundations of molecular dynamics and the improvements made in the direction of getting such ensemble. Specific application of the technique to three main issues (allosteric regulation, docking, and structure refinement) is discussed. PMID:26604800

  17. DSSR-enhanced visualization of nucleic acid structures in Jmol.

    PubMed

    Hanson, Robert M; Lu, Xiang-Jun

    2017-07-03

    Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Edge Bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  19. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

    PubMed

    Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong

    2013-12-01

    To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station.

    PubMed

    Rettig, Trisha A; Ward, Claire; Pecaut, Michael J; Chapes, Stephen K

    2017-07-01

    Spaceflight is known to affect immune cell populations. In particular, splenic B cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after space flight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. This included assessments of our bioinformatic workflow on Illumina HiSeq and MiSeq datasets and is specifically designed to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options. We validated our workflow by comparing our normal mouse MiSeq data to existing murine antibody repertoire studies validating it for future antibody repertoire studies.

  1. Bioinformatics goes back to the future.

    PubMed

    Miller, Crispin J; Attwood, Teresa K

    2003-02-01

    The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.

  2. Design and Implementation of an Interdepartmental Bioinformatics Program across Life Science Curricula

    ERIC Educational Resources Information Center

    Miskowski, Jennifer A.; Howard, David R.; Abler, Michael L.; Grunwald, Sandra K.

    2007-01-01

    Over the past 10 years, there has been a technical revolution in the life sciences leading to the emergence of a new discipline called bioinformatics. In response, bioinformatics-related topics have been incorporated into various undergraduate courses along with the development of new courses solely focused on bioinformatics. This report describes…

  3. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  4. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  5. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  6. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  7. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  8. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  9. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  10. Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens.

    PubMed

    Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Defelipe, Lucas; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo; Turjanski, Adrián G; Fernández Do Porto, Darío

    2018-01-04

    Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Ready to use bioinformatics analysis as a tool to predict immobilisation strategies for protein direct electron transfer (DET).

    PubMed

    Cazelles, R; Lalaoui, N; Hartmann, T; Leimkühler, S; Wollenberger, U; Antonietti, M; Cosnier, S

    2016-11-15

    Direct electron transfer (DET) to proteins is of considerable interest for the development of biosensors and bioelectrocatalysts. While protein structure is mainly used as a method of attaching the protein to the electrode surface, we employed bioinformatics analysis to predict the suitable orientation of the enzymes to promote DET. Structure similarity and secondary structure prediction were combined underlying localized amino-acids able to direct one of the enzyme's electron relays toward the electrode surface by creating a suitable bioelectrocatalytic nanostructure. The electro-polymerization of pyrene pyrrole onto a fluorine-doped tin oxide (FTO) electrode allowed the targeted orientation of the formate dehydrogenase enzyme from Rhodobacter capsulatus (RcFDH) by means of hydrophobic interactions. Its electron relays were directed to the FTO surface, thus promoting DET. The reduction of nicotinamide adenine dinucleotide (NAD(+)) generating a maximum current density of 1μAcm(-2) with 10mM NAD(+) leads to a turnover number of 0.09electron/s/molRcFDH. This work represents a practical approach to evaluate electrode surface modification strategies in order to create valuable bioelectrocatalysts. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  13. Large-scale modelling of the divergent spectrin repeats in nesprins: giant modular proteins.

    PubMed

    Autore, Flavia; Pfuhl, Mark; Quan, Xueping; Williams, Aisling; Roberts, Roland G; Shanahan, Catherine M; Fraternali, Franca

    2013-01-01

    Nesprin-1 and nesprin-2 are nuclear envelope (NE) proteins characterized by a common structure of an SR (spectrin repeat) rod domain and a C-terminal transmembrane KASH [Klarsicht-ANC-Syne-homology] domain and display N-terminal actin-binding CH (calponin homology) domains. Mutations in these proteins have been described in Emery-Dreifuss muscular dystrophy and attributed to disruptions of interactions at the NE with nesprins binding partners, lamin A/C and emerin. Evolutionary analysis of the rod domains of the nesprins has shown that they are almost entirely composed of unbroken SR-like structures. We present a bioinformatical approach to accurate definition of the boundaries of each SR by comparison with canonical SR structures, allowing for a large-scale homology modelling of the 74 nesprin-1 and 56 nesprin-2 SRs. The exposed and evolutionary conserved residues identify important pbs for protein-protein interactions that can guide tailored binding experiments. Most importantly, the bioinformatics analyses and the 3D models have been central to the design of selected constructs for protein expression. 1D NMR and CD spectra have been performed of the expressed SRs, showing a folded, stable, high content α-helical structure, typical of SRs. Molecular Dynamics simulations have been performed to study the structural and elastic properties of consecutive SRs, revealing insights in the mechanical properties adopted by these modules in the cell.

  14. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  15. Bioinformatics core competencies for undergraduate life sciences education.

    PubMed

    Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.

  16. Bioinformatics core competencies for undergraduate life sciences education

    PubMed Central

    Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542

  17. Comparative promoter analysis allows de novo identification of specialized cell junction-associated proteins.

    PubMed

    Cohen, Clemens D; Klingenhoff, Andreas; Boucherot, Anissa; Nitsche, Almut; Henger, Anna; Brunner, Bodo; Schmid, Holger; Merkle, Monika; Saleem, Moin A; Koller, Klaus-Peter; Werner, Thomas; Gröne, Hermann-Josef; Nelson, Peter J; Kretzler, Matthias

    2006-04-11

    Shared transcription factor binding sites that are conserved in distance and orientation help control the expression of gene products that act together in the same biological context. New bioinformatics approaches allow the rapid characterization of shared promoter structures and can be used to find novel interacting molecules. Here, these principles are demonstrated by using molecules linked to the unique functional unit of the glomerular slit diaphragm. An evolutionarily conserved promoter model was generated by comparative genomics in the proximal promoter regions of the slit diaphragm-associated molecule nephrin. Phylogenetic promoter fingerprints of known elements of the slit diaphragm complex identified the nephrin model in the promoter region of zonula occludens-1 (ZO-1). Genome-wide scans using this promoter model effectively predicted a previously unrecognized slit diaphragm molecule, cadherin-5. Nephrin, ZO-1, and cadherin-5 mRNA showed stringent coexpression across a diverse set of human glomerular diseases. Comparative promoter analysis can identify regulatory pathways at work in tissue homeostasis and disease processes.

  18. MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

    PubMed

    Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

    2011-04-15

    Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.

  19. Use of 2D chemical structure and bioactivity profiles to generate chemical categories within an Adverse Outcome Pathway network

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework has emerged to capitalise on the vast quantity of mechanistic data generated by alternative techniques, as well as advances in systems biology, cheminformatics, and bioinformatics. AOPs provide a scaffold onto which mechanistic data can...

  20. The growing need for microservices in bioinformatics.

    PubMed

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting.

  1. The growing need for microservices in bioinformatics

    PubMed Central

    Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.

    2016-01-01

    Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting. PMID:27994937

  2. Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource

    ERIC Educational Resources Information Center

    Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc

    2005-01-01

    EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…

  3. The 2017 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest. PMID:29118973

  4. The 2017 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year's theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.

  5. Rising Strengths Hong Kong SAR in Bioinformatics.

    PubMed

    Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy

    2017-06-01

    Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.

  6. Using Bioinformatics Approach to Explore the Pharmacological Mechanisms of Multiple Ingredients in Shuang-Huang-Lian

    PubMed Central

    Zhang, Bai-xia; Li, Jian; Gu, Hao; Li, Qiang; Zhang, Qi; Zhang, Tian-jiao; Wang, Yun; Cai, Cheng-ke

    2015-01-01

    Due to the proved clinical efficacy, Shuang-Huang-Lian (SHL) has developed a variety of dosage forms. However, the in-depth research on targets and pharmacological mechanisms of SHL preparations was scarce. In the presented study, the bioinformatics approaches were adopted to integrate relevant data and biological information. As a result, a PPI network was built and the common topological parameters were characterized. The results suggested that the PPI network of SHL exhibited a scale-free property and modular architecture. The drug target network of SHL was structured with 21 functional modules. According to certain modules and pharmacological effects distribution, an antitumor effect and potential drug targets were predicted. A biological network which contained 26 subnetworks was constructed to elucidate the antipneumonia mechanism of SHL. We also extracted the subnetwork to explicitly display the pathway where one effective component acts on the pneumonia related targets. In conclusions, a bioinformatics approach was established for exploring the drug targets, pharmacological activity distribution, effective components of SHL, and its mechanism of antipneumonia. Above all, we identified the effective components and disclosed the mechanism of SHL from the view of system. PMID:26495421

  7. Great interactions: How binding incorrect partners can teach us about protein recognition and function.

    PubMed

    Vamparys, Lydie; Laurent, Benoist; Carbone, Alessandra; Sacquin-Mora, Sophie

    2016-10-01

    Protein-protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross-docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross-docking predictions using the area under the specificity-sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding-site predictions resulting from the cross-docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408-1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  8. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  9. Application of proteomics to ecology and population biology.

    PubMed

    Karr, T L

    2008-02-01

    Proteomics is a relatively new scientific discipline that merges protein biochemistry, genome biology and bioinformatics to determine the spatial and temporal expression of proteins in cells, tissues and whole organisms. There has been very little application of proteomics to the fields of behavioral genetics, evolution, ecology and population dynamics, and has only recently been effectively applied to the closely allied fields of molecular evolution and genetics. However, there exists considerable potential for proteomics to impact in areas related to functional ecology; this review will introduce the general concepts and methodologies that define the field of proteomics and compare and contrast the advantages and disadvantages with other methods. Examples of how proteomics can aid, complement and indeed extend the study of functional ecology will be discussed including the main tool of ecological studies, population genetics with an emphasis on metapopulation structure analysis. Because proteomic analyses provide a direct measure of gene expression, it obviates some of the limitations associated with other genomic approaches, such as microarray and EST analyses. Likewise, in conjunction with associated bioinformatics and molecular evolutionary tools, proteomics can provide the foundation of a systems-level integration approach that can enhance ecological studies. It can be envisioned that proteomics will provide important new information on issues specific to metapopulation biology and adaptive processes in nature. A specific example of the application of proteomics to sperm ageing is provided to illustrate the potential utility of the approach.

  10. Bioinformatic Analysis of Strawberry GSTF12 Gene

    NASA Astrophysics Data System (ADS)

    Wang, Xiran; Jiang, Leiyu; Tang, Haoru

    2018-01-01

    GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.

  11. Establishing a master's degree programme in bioinformatics: challenges and opportunities.

    PubMed

    Sahinidis, N V; Harandi, M T; Heath, M T; Murphy, L; Snir, M; Wheeler, R P; Zukoski, C F

    2005-12-01

    The development of the Bioinformatics MS degree program at the University of Illinois, the challenges and opportunities associated with such a process, and the current structure of the program is described. This program has departed from earlier University practice in significant ways. Despite the existence of several interdisciplinary programs at the University, a few of which grant degrees, this is the first interdisciplinary program that grants degrees and formally recognises departmental specialisation areas. The program, which is not owned by any particular department but by the Graduate College itself, is operated in a franchise-like fashion via several departmental concentrations. With four different colleges and many more departments involved in establishing and operating the program, the logistics of the operation are of considerable complexity but result in significant interactions across the entire campus.

  12. Evolving from bioinformatics in-the-small to bioinformatics in-the-large.

    PubMed

    Parker, D Stott; Gorlick, Michael M; Lee, Christopher J

    2003-01-01

    We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.

  13. Bioinformatics education dissemination with an evolutionary problem solving perspective.

    PubMed

    Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J

    2010-11-01

    Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.

  14. DockoMatic: automated peptide analog creation for high throughput virtual screening.

    PubMed

    Jacob, Reed B; Bullock, Casey W; Andersen, Tim; McDougal, Owen M

    2011-10-01

    The purpose of this manuscript is threefold: (1) to describe an update to DockoMatic that allows the user to generate cyclic peptide analog structure files based on protein database (pdb) files, (2) to test the accuracy of the peptide analog structure generation utility, and (3) to evaluate the high throughput capacity of DockoMatic. The DockoMatic graphical user interface interfaces with the software program Treepack to create user defined peptide analogs. To validate this approach, DockoMatic produced cyclic peptide analogs were tested for three-dimensional structure consistency and binding affinity against four experimentally determined peptide structure files available in the Research Collaboratory for Structural Bioinformatics database. The peptides used to evaluate this new functionality were alpha-conotoxins ImI, PnIA, and their published analogs. Peptide analogs were generated by DockoMatic and tested for their ability to bind to X-ray crystal structure models of the acetylcholine binding protein originating from Aplysia californica. The results, consisting of more than 300 simulations, demonstrate that DockoMatic predicts the binding energy of peptide structures to within 3.5 kcal mol(-1), and the orientation of bound ligand compares to within 1.8 Å root mean square deviation for ligand structures as compared to experimental data. Evaluation of high throughput virtual screening capacity demonstrated that Dockomatic can collect, evaluate, and summarize the output of 10,000 AutoDock jobs in less than 2 hours of computational time, while 100,000 jobs requires approximately 15 hours and 1,000,000 jobs is estimated to take up to a week. Copyright © 2011 Wiley Periodicals, Inc.

  15. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  16. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  17. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.

    PubMed

    Janicki, Mateusz; Rooke, Rebecca; Yang, Guojun

    2011-08-01

    A major portion of most eukaryotic genomes are transposable elements (TEs). During evolution, TEs have introduced profound changes to genome size, structure, and function. As integral parts of genomes, the dynamic presence of TEs will continue to be a major force in reshaping genomes. Early computational analyses of TEs in genome sequences focused on filtering out "junk" sequences to facilitate gene annotation. When the high abundance and diversity of TEs in eukaryotic genomes were recognized, these early efforts transformed into the systematic genome-wide categorization and classification of TEs. The availability of genomic sequence data reversed the classical genetic approaches to discovering new TE families and superfamilies. Curated TE databases and their accurate annotation of genome sequences in turn facilitated the studies on TEs in a number of frontiers including: (1) TE-mediated changes of genome size and structure, (2) the influence of TEs on genome and gene functions, (3) TE regulation by host, (4) the evolution of TEs and their population dynamics, and (5) genomic scale studies of TE activity. Bioinformatics and genomic approaches have become an integral part of large-scale studies on TEs to extract information with pure in silico analyses or to assist wet lab experimental studies. The current revolution in genome sequencing technology facilitates further progress in the existing frontiers of research and emergence of new initiatives. The rapid generation of large-sequence datasets at record low costs on a routine basis is challenging the computing industry on storage capacity and manipulation speed and the bioinformatics community for improvement in algorithms and their implementations.

  18. Bioinformatics analysis of ROP8 protein to improve vaccine design against Toxoplasma gondii.

    PubMed

    Foroutan, Masoud; Ghaffarifar, Fatemeh; Sharifi, Zohreh; Dalimi, Abdolhosein; Pirestani, Majid

    2018-04-26

    Rhoptry proteins (ROPs) are involved in the different stages of Toxoplasma gondii (T. gondii) invasion and are also critical for survival within host cells. ROP8 is expressed in the early stages of infection and have a key role in the parasitophorous vacuole (PV) formation. In this paper, we have combined several bioinformatics online servers for immunogenicity prediction of ROP8 protein. In this study, several bioinformatics approaches were used to analyze the different aspects of ROP8 protein, including the physico-chemical properties, transmembrane domain, subcellular localization, secondary and tertiary structure, B and T-cell potential epitopes, and other important characteristics of this protein. The findings showed that ROP8 protein had 60 potential post-translational modification sites. Also, only one transmembrane domain was recognized for this protein. The secondary structure of ROP8 protein comprises 33.04% alpha-helix, 18.26% extended strand, and 48.70% random coil. Moreover, several potential B and T-cell epitopes were identified for ROP8. In addition, the obtained findings from antigenicity and allergenicity evaluation remarked that this protein is immunogenic and non-allergen. Based on the results of Ramachandran plot, 94.8%, 4.1%, and 1.1% of amino acid residues were incorporated in the favored, allowed, and outlier regions, respectively. This paper provides a foundation for further investigations, and laid a theoretical basis for the development of an appropriate vaccine against toxoplasmosis. More studies are needed experimentally using the ROP8 alone or in combination with other antigens in the future. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

    PubMed Central

    Wood, Louisa; Gebhardt, Philipp

    2013-01-01

    Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266

  20. The 2016 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.

  1. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  2. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics.

    PubMed

    Egea, Jose A; Henriques, David; Cokelaer, Thomas; Villaverde, Alejandro F; MacNamara, Aidan; Danciu, Diana-Patricia; Banga, Julio R; Saez-Rodriguez, Julio

    2014-05-10

    Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools. We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods. MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods.

  3. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics

    PubMed Central

    2014-01-01

    Background Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools. Results We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods. Conclusions MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods. PMID:24885957

  4. The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics

    PubMed Central

    2013-01-01

    User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users’ requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios. For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature. We employed several UCD techniques, including: persona development, interviews, ‘canvas sort’ card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience. PMID:23514033

  5. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm.

    PubMed

    Skinnider, Michael A; Dejong, Chris A; Franczak, Brian C; McNicholas, Paul D; Magarvey, Nathan A

    2017-08-16

    Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

  6. Comparative BioInformatics and Computational Toxicology

    EPA Science Inventory

    Reflecting the numerous changes in the field since the publication of the previous edition, this third edition of Developmental Toxicology focuses on the mechanisms of developmental toxicity and incorporates current technologies for testing in the risk assessment process.

  7. Occurrence of lignin degradation genotypes and phenotypes among prokaryotes.

    PubMed

    Tian, Jiang-Hao; Pourcher, Anne-Marie; Bouchez, Théodore; Gelhaye, Eric; Peu, Pascal

    2014-12-01

    A number of prokaryotes actively contribute to lignin degradation in nature and their activity could be of interest for many applications including the production of biogas/biofuel from lignocellulosic biomass and biopulping. This review compares the reliability and efficiency of the culture-dependent screening methods currently used for the isolation of ligninolytic prokaryotes. Isolated prokaryotes exhibiting lignin-degrading potential are presented according to their phylogenetic groups. With the development of bioinformatics, culture-independent techniques are emerging that allow larger-scale data mining for ligninolytic prokaryotic functions but today, these techniques still have some limits. In this work, two phylogenetic affiliations of isolated prokaryotes exhibiting ligninolytic potential and laccase-encoding prokaryotes were determined on the basis of 16S rDNA sequences, providing a comparative view of results obtained by the two types of screening techniques. The combination of laboratory culture and bioinformatics approaches is a promising way to explore lignin-degrading prokaryotes.

  8. Chondrocyte channel transcriptomics

    PubMed Central

    Lewis, Rebecca; May, Hannah; Mobasheri, Ali; Barrett-Jolley, Richard

    2013-01-01

    To date, a range of ion channels have been identified in chondrocytes using a number of different techniques, predominantly electrophysiological and/or biomolecular; each of these has its advantages and disadvantages. Here we aim to compare and contrast the data available from biophysical and microarray experiments. This letter analyses recent transcriptomics datasets from chondrocytes, accessible from the European Bioinformatics Institute (EBI). We discuss whether such bioinformatic analysis of microarray datasets can potentially accelerate identification and discovery of ion channels in chondrocytes. The ion channels which appear most frequently across these microarray datasets are discussed, along with their possible functions. We discuss whether functional or protein data exist which support the microarray data. A microarray experiment comparing gene expression in osteoarthritis and healthy cartilage is also discussed and we verify the differential expression of 2 of these genes, namely the genes encoding large calcium-activated potassium (BK) and aquaporin channels. PMID:23995703

  9. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were analyzed in relation to their experience in learning a bioinformatics mini-unit. There were distinct individual differences among the participants, especially in the way they processed information and integrated procedural and analytical thought during bioinformatics learning. These differences may provide insights into some of the specific needs of students that educators and curriculum designers should consider when designing bioinformatics learning experiences. Implications for teacher education and curriculum design are presented in addition to some suggestions for further research.

  10. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    PubMed

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads

    PubMed Central

    Schröder, Jan; Hsu, Arthur; Boyle, Samantha E.; Macintyre, Geoff; Cmero, Marek; Tothill, Richard W.; Johnstone, Ricky W.; Shackleton, Mark; Papenfuss, Anthony T.

    2014-01-01

    Motivation: Methods for detecting somatic genome rearrangements in tumours using next-generation sequencing are vital in cancer genomics. Available algorithms use one or more sources of evidence, such as read depth, paired-end reads or split reads to predict structural variants. However, the problem remains challenging due to the significant computational burden and high false-positive or false-negative rates. Results: In this article, we present Socrates (SOft Clip re-alignment To idEntify Structural variants), a highly efficient and effective method for detecting genomic rearrangements in tumours that uses only split-read data. Socrates has single-nucleotide resolution, identifies micro-homologies and untemplated sequence at break points, has high sensitivity and high specificity and takes advantage of parallelism for efficient use of resources. We demonstrate using simulated and real data that Socrates performs well compared with a number of existing structural variant detection tools. Availability and implementation: Socrates is released as open source and available from http://bioinf.wehi.edu.au/socrates. Contact: papenfuss@wehi.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24389656

  12. CABS-flex predictions of protein flexibility compared with NMR ensembles

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian

    2014-01-01

    Motivation: Identification of flexible regions of protein structures is important for understanding of their biological functions. Recently, we have developed a fast approach for predicting protein structure fluctuations from a single protein model: the CABS-flex. CABS-flex was shown to be an efficient alternative to conventional all-atom molecular dynamics (MD). In this work, we evaluate CABS-flex and MD predictions by comparison with protein structural variations within NMR ensembles. Results: Based on a benchmark set of 140 proteins, we show that the relative fluctuations of protein residues obtained from CABS-flex are well correlated to those of NMR ensembles. On average, this correlation is stronger than that between MD and NMR ensembles. In conclusion, CABS-flex is useful and complementary to MD in predicting protein regions that undergo conformational changes as well as the extent of such changes. Availability and implementation: The CABS-flex is freely available to all users at http://biocomp.chem.uw.edu.pl/CABSflex. Contact: sekmi@chem.uw.edu.pl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24735558

  13. An in-silico insight into the characteristics of β-propeller phytase.

    PubMed

    Mathew, Akash; Verma, Anukriti; Gaur, Smriti

    2014-06-01

    Phytase is an enzyme that is found extensively in the plant kingdom and in some species of bacteria and fungi. This paper identifies and analyses the available full length sequences of β-propeller phytases (BPP). BPP was chosen due to its potential applicability in the field of aquaculture. The sequences were obtained from the Uniprot database and subject to various online bioinformatics tools to elucidate the physio-chemical characteristics, secondary structures and active site compositions of BPP. Protparam and SOPMA were used to analyse the physiochemical and secondary structure characteristics, while the Expasy online modelling tool and CASTp were used to model the 3-D structure and identify the active sites of the BPP sequences. The amino acid compositions of the four sequences were compared and composed in a graphical format to identify similarities and highlight the potentially important amino acids that form the active site of BPP. This study aims to analyse BPP and contribute to the clarification of the molecular mechanism involved in the enzyme activity of BPP and contribute in part to the possibility of constructing a synthetic version of BPP.

  14. CABS-flex predictions of protein flexibility compared with NMR ensembles.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian

    2014-08-01

    Identification of flexible regions of protein structures is important for understanding of their biological functions. Recently, we have developed a fast approach for predicting protein structure fluctuations from a single protein model: the CABS-flex. CABS-flex was shown to be an efficient alternative to conventional all-atom molecular dynamics (MD). In this work, we evaluate CABS-flex and MD predictions by comparison with protein structural variations within NMR ensembles. Based on a benchmark set of 140 proteins, we show that the relative fluctuations of protein residues obtained from CABS-flex are well correlated to those of NMR ensembles. On average, this correlation is stronger than that between MD and NMR ensembles. In conclusion, CABS-flex is useful and complementary to MD in predicting protein regions that undergo conformational changes as well as the extent of such changes. The CABS-flex is freely available to all users at http://biocomp.chem.uw.edu.pl/CABSflex. sekmi@chem.uw.edu.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  15. Some of the most interesting CASP11 targets through the eyes of their authors

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K.; Edwards, Robert A.; Fass, Deborah; Hartmann, Marcus D.; Korycinski, Mateusz; Lewis, Richard J.; Lorimer, Donald; Lupas, Andrei N.; Newman, Janet; Peat, Thomas S.; Piepenbrink, Kurt H.; Prahlad, Janani; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca M.; Seguritan, Victor; Sundberg, Eric J.; Singh, Abhimanyu K.; Wilson, Mark A.

    2015-01-01

    ABSTRACT The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34–50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26473983

  16. Modeling Human Serum Albumin Tertiary Structure to Teach Upper-Division Chemistry Students Bioinformatics and Homology Modeling Basics

    ERIC Educational Resources Information Center

    Petrovic, Dus?an; Zlatovic´, Mario

    2015-01-01

    A homology modeling laboratory experiment has been developed for an introductory molecular modeling course for upper-division undergraduate chemistry students. With this experiment, students gain practical experience in homology model preparation and assessment as well as in protein visualization using the educational version of PyMOL…

  17. Protein Engineering: Development of a Metal Ion Dependent Switch

    DTIC Science & Technology

    2017-05-22

    Society of Chemistry Royal Society of Chemistry Biochemistry PNAS Escherichia coli Journal of Biotechnology Biochemistry Nature Protocols Journal of...Molecular Biology Biochemistry Royal Society of Chemistry Proteins: Structure, Function, and Bioinformatics Journal of Molecular Biology Biophysical...Biophysical Journal Protein Science Journal of Computational Chemistry Current Opinion in Chemical Biology Royal Society of Chemistry

  18. Identification and characterization of long non-coding RNAs in rainbow trout eggs

    USDA-ARS?s Scientific Manuscript database

    Long non-coding RNAs (lncRNAs) are in general considered as a diverse class of transcripts longer than 200 nucleotides that structurally resemble mRNAs but do not encode proteins. Recent advances in RNA sequencing (RNA-Seq) and bioinformatics methods have provided an opportunity to indentify and ana...

  19. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  20. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  1. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  2. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

  3. Voroprot: an interactive tool for the analysis and visualization of complex geometric features of protein structure.

    PubMed

    Olechnovic, Kliment; Margelevicius, Mindaugas; Venclovas, Ceslovas

    2011-03-01

    We present Voroprot, an interactive cross-platform software tool that provides a unique set of capabilities for exploring geometric features of protein structure. Voroprot allows the construction and visualization of the Apollonius diagram (also known as the additively weighted Voronoi diagram), the Apollonius graph, protein alpha shapes, interatomic contact surfaces, solvent accessible surfaces, pockets and cavities inside protein structure. Voroprot is available for Windows, Linux and Mac OS X operating systems and can be downloaded from http://www.ibt.lt/bioinformatics/voroprot/.

  4. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

    PubMed

    Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

    2011-11-01

    Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

  5. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  6. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  7. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  8. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs

    PubMed Central

    2012-01-01

    Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China. PMID:23281929

  9. Large Scale Comparative Visualisation of Regulatory Networks with TRNDiff

    DOE PAGES

    Chua, Xin-Yi; Buckingham, Lawrence; Hogan, James M.; ...

    2015-06-01

    The advent of Next Generation Sequencing (NGS) technologies has seen explosive growth in genomic datasets, and dense coverage of related organisms, supporting study of subtle, strain-specific variations as a determinant of function. Such data collections present fresh and complex challenges for bioinformatics, those of comparing models of complex relationships across hundreds and even thousands of sequences. Transcriptional Regulatory Network (TRN) structures document the influence of regulatory proteins called Transcription Factors (TFs) on associated Target Genes (TGs). TRNs are routinely inferred from model systems or iterative search, and analysis at these scales requires simultaneous displays of multiple networks well beyond thosemore » of existing network visualisation tools [1]. In this paper we describe TRNDiff, an open source system supporting the comparative analysis and visualization of TRNs (and similarly structured data) from many genomes, allowing rapid identification of functional variations within species. The approach is demonstrated through a small scale multiple TRN analysis of the Fur iron-uptake system of Yersinia, suggesting a number of candidate virulence factors; and through a larger study exploiting integration with the RegPrecise database (http://regprecise.lbl.gov; [2]) - a collection of hundreds of manually curated and predicted transcription factor regulons drawn from across the entire spectrum of prokaryotic organisms.« less

  10. OpenHelix: bioinformatics education outside of a different box.

    PubMed

    Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C

    2010-11-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.

  11. OpenHelix: bioinformatics education outside of a different box

    PubMed Central

    Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.

    2010-01-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  12. Translational bioinformatics: linking the molecular world to the clinical world.

    PubMed

    Altman, R B

    2012-06-01

    Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.

  13. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

    PubMed

    Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

    2012-01-01

    Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.

  14. Classification of protein quaternary structure by functional domain composition

    PubMed Central

    Yu, Xiaojing; Wang, Chuan; Li, Yixue

    2006-01-01

    Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572

  15. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological conclusions. Our results demonstrate that while there were differences in depth of coverage and phylogenetic diversity, all workflows revealed comparable treatment effects on microbial diversity. To increase reproducibility and reliability and to retain consistency between similar studies, it is important to consider the impact on data quality and relative abundance of taxa when selecting NGS platforms and analysis tools for microbiome studies.

  16. Deep learning in bioinformatics.

    PubMed

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2017-09-01

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  18. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

    PubMed

    Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B

    2016-01-01

    Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.

  19. Building macromolecular assemblies by information-driven docking: introducing the HADDOCK multibody docking server.

    PubMed

    Karaca, Ezgi; Melquiond, Adrien S J; de Vries, Sjoerd J; Kastritis, Panagiotis L; Bonvin, Alexandre M J J

    2010-08-01

    Over the last years, large scale proteomics studies have generated a wealth of information of biomolecular complexes. Adding the structural dimension to the resulting interactomes represents a major challenge that classical structural experimental methods alone will have difficulties to confront. To meet this challenge, complementary modeling techniques such as docking are thus needed. Among the current docking methods, HADDOCK (High Ambiguity-Driven DOCKing) distinguishes itself from others by the use of experimental and/or bioinformatics data to drive the modeling process and has shown a strong performance in the critical assessment of prediction of interactions (CAPRI), a blind experiment for the prediction of interactions. Although most docking programs are limited to binary complexes, HADDOCK can deal with multiple molecules (up to six), a capability that will be required to build large macromolecular assemblies. We present here a novel web interface of HADDOCK that allows the user to dock up to six biomolecules simultaneously. This interface allows the inclusion of a large variety of both experimental and/or bioinformatics data and supports several types of cyclic and dihedral symmetries in the docking of multibody assemblies. The server was tested on a benchmark of six cases, containing five symmetric homo-oligomeric protein complexes and one symmetric protein-DNA complex. Our results reveal that, in the presence of either bioinformatics and/or experimental data, HADDOCK shows an excellent performance: in all cases, HADDOCK was able to generate good to high quality solutions and ranked them at the top, demonstrating its ability to model symmetric multicomponent assemblies. Docking methods can thus play an important role in adding the structural dimension to interactomes. However, although the current docking methodologies were successful for a vast range of cases, considering the variety and complexity of macromolecular assemblies, inclusion of some kind of experimental information (e.g. from mass spectrometry, nuclear magnetic resonance, cryoelectron microscopy, etc.) will remain highly desirable to obtain reliable results.

  20. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    PubMed

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

  1. Virtual Interactomics of Proteins from Biochemical Standpoint

    PubMed Central

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel

    2012-01-01

    Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations. PMID:22928109

  2. Memory-efficient RNA energy landscape exploration

    PubMed Central

    Mann, Martin; Kucharík, Marcel; Flamm, Christoph; Wolfinger, Michael T.

    2014-01-01

    Motivation: Energy landscapes provide a valuable means for studying the folding dynamics of short RNA molecules in detail by modeling all possible structures and their transitions. Higher abstraction levels based on a macro-state decomposition of the landscape enable the study of larger systems; however, they are still restricted by huge memory requirements of exact approaches. Results: We present a highly parallelizable local enumeration scheme that enables the computation of exact macro-state transition models with highly reduced memory requirements. The approach is evaluated on RNA secondary structure landscapes using a gradient basin definition for macro-states. Furthermore, we demonstrate the need for exact transition models by comparing two barrier-based approaches, and perform a detailed investigation of gradient basins in RNA energy landscapes. Availability and implementation: Source code is part of the C++ Energy Landscape Library available at http://www.bioinf.uni-freiburg.de/Software/. Contact: mmann@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24833804

  3. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers.

    PubMed

    Campbell, Kieran R; Yau, Christopher

    2017-03-15

    Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

  4. Expanding roles in a library-based bioinformatics service program: a case study

    PubMed Central

    Li, Meng; Chen, Yi-Bu; Clintworth, William A

    2013-01-01

    Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting: A program at a health sciences library serving a large academic medical center with a strong research focus is described. Methods: The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. Results: Personnel and software have increased along with the number of registered software users and use of the provided services. Conclusion: With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs. PMID:24163602

  5. 4273π: Bioinformatics education on low cost ARM hardware

    PubMed Central

    2013-01-01

    Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194

  6. 4273π: bioinformatics education on low cost ARM hardware.

    PubMed

    Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

    2013-08-12

    Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

  7. A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.

    PubMed

    Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis

    2012-07-01

    The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.

  8. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2017-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

  9. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2018-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101

  10. Molecular docking of Glycine max and Medicago truncatula ureases with urea; bioinformatics approaches.

    PubMed

    Filiz, Ertugrul; Vatansever, Recep; Ozyigit, Ibrahim Ilker

    2016-03-01

    Urease (EC 3.5.1.5) is a nickel-dependent metalloenzyme catalyzing the hydrolysis of urea into ammonia and carbon dioxide. It is present in many bacteria, fungi, yeasts and plants. Most species, with few exceptions, use nickel metalloenzyme urease to hydrolyze urea, which is one of the commonly used nitrogen fertilizer in plant growth thus its enzymatic hydrolysis possesses vital importance in agricultural practices. Considering the essentiality and importance of urea and urease activity in most plants, this study aimed to comparatively investigate the ureases of two important legume species such as Glycine max (soybean) and Medicago truncatula (barrel medic) from Fabaceae family. With additional plant species, primary and secondary structures of 37 plant ureases were comparatively analyzed using various bioinformatics tools. A structure based phylogeny was constructed using predicted 3D models of G. max and M. truncatula, whose crystallographic structures are not available, along with three additional solved urease structures from Canavalia ensiformis (PDB: 4GY7), Bacillus pasteurii (PDB: 4UBP) and Klebsiella aerogenes (PDB: 1FWJ). In addition, urease structures of these species were docked with urea to analyze the binding affinities, interacting amino acids and atom distances in urease-urea complexes. Furthermore, mutable amino acids which could potentially affect the protein active site, stability and flexibility as well as overall protein stability were analyzed in urease structures of G. max and M. truncatula. Plant ureases demonstrated similar physico-chemical properties with 833-878 amino acid residues and 89.39-90.91 kDa molecular weight with mainly acidic (5.15-6.10 pI) nature. Four protein domain structures such as urease gamma, urease beta, urease alpha and amidohydro 1 characterized the plant ureases. Secondary structure of plant ureases also demonstrated conserved protein architecture, with predominantly α-helix and random coil structures. In structure-based phylogeny, plant ureases from G. max, M. truncatula and C. ensiformis were clearly diverged from bacterial ureases of B. pasteurii and K. aerogenes. Glu, Thr, His and Gly were commonly found as interacting residues in most urease-urea docking complexes while Glu was available in all docked structures. Besides, Ala and Arg residues, which are reported in active-site architecture of plant and bacterial ureases were present in G. max urea-urease complex but not present in others. Moreover, Arg435 and Arg437 in M. truncatula and G. max, respectively were identified as highly mutable hotspot residues residing in amidohydro 1 domain of enzyme. In addition, a number of stabilizing residues were predicted upon mutation of these hotspot residues however Cys and Thr made strong implications since they were also found in codon-aligned sequences as substitutions of hotspot residues. Comparative analyses of primary sequence and secondary structure in 37 different plants demonstrated quite conserved natures of ureases in plant kingdom. Structure-based phylogeny indicated the presence of a possible prokaryote-eukaryote split and implicated the subjection of bacterial ureases to heavy selection in prokaryotic evolution compared to plants. Urea-urease docking complexes suggested that different species could share common interacting residues as well as may have some other uncommon residues at species-dependent way. In silico mutation analyses identified mutable amino acids, which were predicted to reside in catalytic site of enzyme therefore mutagenesis at these sites seemed to have adverse effects on enzyme efficiency or function. This study findings will become valuable preliminary resource for future studies to further understand the primary, secondary and tertiary structures of urease sequences in plants as well as it will provide insights about various binding features of urea-urease complexes.

  11. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  12. Integrating protein structural dynamics and evolutionary analysis with Bio3D.

    PubMed

    Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

    2014-12-10

    Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

  13. Generation of a consensus protein domain dictionary

    PubMed Central

    Schaeffer, R. Dustin; Jonsson, Amanda L.; Simms, Andrew M.; Daggett, Valerie

    2011-01-01

    Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD. Availability and implementation: This domain dictionary is available at www.dynameomics.org. Contact: daggett@u.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21068000

  14. Bioinformatics analysis of the oxidosqualene cyclase gene and the amino acid sequence in mangrove plants

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wati, R.

    2017-01-01

    This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.

  15. DNA secondary structures: stability and function of G-quadruplex structures

    PubMed Central

    Bochman, Matthew L.; Paeschke, Katrin; Zakian, Virginia A.

    2013-01-01

    In addition to the canonical double helix, DNA can fold into various other inter- and intramolecular secondary structures. Although many such structures were long thought to be in vitro artefacts, bioinformatics demonstrates that DNA sequences capable of forming these structures are conserved throughout evolution, suggesting the existence of non-B-form DNA in vivo. In addition, genes whose products promote formation or resolution of these structures are found in diverse organisms, and a growing body of work suggests that the resolution of DNA secondary structures is critical for genome integrity. This Review focuses on emerging evidence relating to the characteristics of G-quadruplex structures and the possible influence of such structures on genomic stability and cellular processes, such as transcription. PMID:23032257

  16. Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

    PubMed Central

    Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769

  17. The 2015 Bioinformatics Open Source Conference (BOSC 2015)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J. A.; Lapp, Hilmar

    2016-01-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included “Data Science;” “Standards and Interoperability;” “Open Science and Reproducibility;” “Translational Bioinformatics;” “Visualization;” and “Bioinformatics Open Source Project Updates”. In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled “Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community,” that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653

  18. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Making authentic science accessible-the benefits and challenges of integrating bioinformatics into a high-school science curriculum.

    PubMed

    Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.

  20. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

    PubMed

    Oulas, Anastasis; Minadakis, George; Zachariou, Margarita; Sokratous, Kleitos; Bourdakou, Marilena M; Spyrou, George M

    2017-11-27

    Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine. © The Author 2017. Published by Oxford University Press.

  1. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  2. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  3. Interdisciplinary Introductory Course in Bioinformatics

    ERIC Educational Resources Information Center

    Kortsarts, Yana; Morris, Robert W.; Utell, Janine M.

    2010-01-01

    Bioinformatics is a relatively new interdisciplinary field that integrates computer science, mathematics, biology, and information technology to manage, analyze, and understand biological, biochemical and biophysical information. We present our experience in teaching an interdisciplinary course, Introduction to Bioinformatics, which was developed…

  4. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  5. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  6. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  7. A bioinformatics prediction approach towards analyzing the glycosylation, co-expression and interaction patterns of epithelial membrane antigen (EMA/MUC1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kalra, Rajkumar S., E-mail: renu-wadhwa@aist.go.jp; Wadhwa, Renu, E-mail: renu-wadhwa@aist.go.jp

    2015-02-27

    Epithelial membrane antigen (EMA or MUC1) is a heavily glycosylated, type I transmembrane glycoprotein commonly expressed by epithelial cells of duct organs. It has been shown to be aberrantly glycosylated in several diseases including cancer. Protein sequence based annotation and analysis of glycosylation profile of glycoproteins by robust computational and comprehensive algorithms provides possible insights to the mechanism(s) of anomalous glycosylation. In present report, by using a number of bioinformatics applications we studied EMA/MUC1 and explored its trans-membrane structural domain sequence that is widely subjected to glycosylation. Exploration of different extracellular motifs led to prediction of N and O-linked glycosylationmore » target sites. Based on the putative O-linked target sites, glycosylated moieties and pathways were envisaged. Furthermore, Protein network analysis demonstrated physical interaction of EMA with a number of proteins and confirmed its functional involvement in cell growth and proliferation pathways. Gene Ontology analysis suggested an involvement of EMA in a number of functions including signal transduction, protein binding, processing and transport along with glycosylation. Thus, present study explored potential of bioinformatics prediction approach in analyzing glycosylation, co-expression and interaction patterns of EMA/MUC1 glycoprotein.« less

  8. Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

    PubMed

    Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

    2015-05-01

    Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

    PubMed Central

    Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-01-01

    Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919

  10. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.

    PubMed

    Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-05-01

    RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  11. Federation in genomics pipelines: techniques and challenges.

    PubMed

    Chaterji, Somali; Koo, Jinkyu; Li, Ninghui; Meyer, Folker; Grama, Ananth; Bagchi, Saurabh

    2017-08-29

    Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Rapid Identification of Cell-Specific, Internalizing RNA Aptamers with Bioinformatics Analyses of a Cell-Based Aptamer Selection

    PubMed Central

    Thiel, William H.; Bair, Thomas; Peek, Andrew S.; Liu, Xiuying; Dassie, Justin; Stockdale, Katie R.; Behlke, Mark A.; Miller, Francis J.; Giangrande, Paloma H.

    2012-01-01

    Background The broad applicability of RNA aptamers as cell-specific delivery tools for therapeutic reagents depends on the ability to identify aptamer sequences that selectively access the cytoplasm of distinct cell types. Towards this end, we have developed a novel approach that combines a cell-based selection method (cell-internalization SELEX) with high-throughput sequencing (HTS) and bioinformatics analyses to rapidly identify cell-specific, internalization-competent RNA aptamers. Methodology/Principal Findings We demonstrate the utility of this approach by enriching for RNA aptamers capable of selective internalization into vascular smooth muscle cells (VSMCs). Several rounds of positive (VSMCs) and negative (endothelial cells; ECs) selection were performed to enrich for aptamer sequences that preferentially internalize into VSMCs. To identify candidate RNA aptamer sequences, HTS data from each round of selection were analyzed using bioinformatics methods: (1) metrics of selection enrichment; and (2) pairwise comparisons of sequence and structural similarity, termed edit and tree distance, respectively. Correlation analyses of experimentally validated aptamers or rounds revealed that the best cell-specific, internalizing aptamers are enriched as a result of the negative selection step performed against ECs. Conclusions and Significance We describe a novel approach that combines cell-internalization SELEX with HTS and bioinformatics analysis to identify cell-specific, cell-internalizing RNA aptamers. Our data highlight the importance of performing a pre-clear step against a non-target cell in order to select for cell-specific aptamers. We expect the extended use of this approach to enable the identification of aptamers to a multitude of different cell types, thereby facilitating the broad development of targeted cell therapies. PMID:22962591

  13. Computer Simulation of Embryonic Systems: What can a virtual embryo teach us about developmental toxicity? (LA Conference on Computational Biology & Bioinformatics)

    EPA Science Inventory

    This presentation will cover work at EPA under the CSS program for: (1) Virtual Tissue Models built from the known biology of an embryological system and structured to recapitulate key cell signals and responses; (2) running the models with real (in vitro) or synthetic (in silico...

  14. Recent developments in MrBUMP: better search-model preparation, graphical interaction with search models, and solution improvement and assessment.

    PubMed

    Keegan, Ronan M; McNicholas, Stuart J; Thomas, Jens M H; Simpkin, Adam J; Simkovic, Felix; Uski, Ville; Ballard, Charles C; Winn, Martyn D; Wilson, Keith S; Rigden, Daniel J

    2018-03-01

    Increasing sophistication in molecular-replacement (MR) software and the rapid expansion of the PDB in recent years have allowed the technique to become the dominant method for determining the phases of a target structure in macromolecular X-ray crystallography. In addition, improvements in bioinformatic techniques for finding suitable homologous structures for use as MR search models, combined with developments in refinement and model-building techniques, have pushed the applicability of MR to lower sequence identities and made weak MR solutions more amenable to refinement and improvement. MrBUMP is a CCP4 pipeline which automates all stages of the MR procedure. Its scope covers everything from the sourcing and preparation of suitable search models right through to rebuilding of the positioned search model. Recent improvements to the pipeline include the adoption of more sensitive bioinformatic tools for sourcing search models, enhanced model-preparation techniques including better ensembling of homologues, and the use of phase improvement and model building on the resulting solution. The pipeline has also been deployed as an online service through CCP4 online, which allows its users to exploit large bioinformatic databases and coarse-grained parallelism to speed up the determination of a possible solution. Finally, the molecular-graphics application CCP4mg has been combined with MrBUMP to provide an interactive visual aid to the user during the process of selecting and manipulating search models for use in MR. Here, these developments in MrBUMP are described with a case study to explore how some of the enhancements to the pipeline and to CCP4mg can help to solve a difficult case.

  15. Recent developments in MrBUMP: better search-model preparation, graphical interaction with search models, and solution improvement and assessment

    PubMed Central

    Keegan, Ronan M.; McNicholas, Stuart J.; Thomas, Jens M. H.; Simpkin, Adam J.; Uski, Ville; Ballard, Charles C.

    2018-01-01

    Increasing sophistication in molecular-replacement (MR) software and the rapid expansion of the PDB in recent years have allowed the technique to become the dominant method for determining the phases of a target structure in macromolecular X-ray crystallography. In addition, improvements in bioinformatic techniques for finding suitable homologous structures for use as MR search models, combined with developments in refinement and model-building techniques, have pushed the applicability of MR to lower sequence identities and made weak MR solutions more amenable to refinement and improvement. MrBUMP is a CCP4 pipeline which automates all stages of the MR procedure. Its scope covers everything from the sourcing and preparation of suitable search models right through to rebuilding of the positioned search model. Recent improvements to the pipeline include the adoption of more sensitive bioinformatic tools for sourcing search models, enhanced model-preparation techniques including better ensembling of homologues, and the use of phase improvement and model building on the resulting solution. The pipeline has also been deployed as an online service through CCP4 online, which allows its users to exploit large bioinformatic databases and coarse-grained parallelism to speed up the determination of a possible solution. Finally, the molecular-graphics application CCP4mg has been combined with MrBUMP to provide an interactive visual aid to the user during the process of selecting and manipulating search models for use in MR. Here, these developments in MrBUMP are described with a case study to explore how some of the enhancements to the pipeline and to CCP4mg can help to solve a difficult case. PMID:29533225

  16. [Clustered regularly interspaced short palindromic repeats: structure, function and application--a review].

    PubMed

    Cui, Yujun; Li, Yanjun; Yan, Yanfeng; Yang, Ruifu

    2008-11-01

    CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), the basis of spoligotyping technology, can provide prokaryotes with heritable adaptive immunity against phages' invasion. Studies on CRISPR loci and their associated elements, including various CAS (CRISPR-associated) proteins and leader sequences, are still in its infant period. We introduce the brief history', structure, function, bioinformatics research and application of this amazing immunity system in prokaryotic organism for inspiring more scientists to find their interest in this developing topic.

  17. Influenza research database: an integrated bioinformatics resource for influenza virus research

    USDA-ARS?s Scientific Manuscript database

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...

  18. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…

  19. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  20. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  1. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  2. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

    PubMed

    El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

    2012-01-01

    Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

  3. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  4. Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

    PubMed

    Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert

    2008-05-01

    Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

  5. Bioinformatics-based tools in drug discovery: the cartography from single gene to integrative biological networks.

    PubMed

    Ramharack, Pritika; Soliman, Mahmoud E S

    2018-06-01

    Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of "Escherichia coli" O157:H7 and "Yersinia pestis"

    ERIC Educational Resources Information Center

    Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.

    2012-01-01

    Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples…

  7. RNA-sequencing data analysis of uterus in ovariectomized rats fed with soy protein isolate,17B-estradiol and casein

    USDA-ARS?s Scientific Manuscript database

    This data file describes the bioinformatics analysis of uterine RNA-seq data comparing genome wide effects of feeding soy protein isolate compared to casein to ovariectomized female rats age 64 days relative to treatment of casein fed rats with 5 ug/kg/d estradiol and relative to rats treated with e...

  8. Bioinformatic Analysis of Pathogenic Missense Mutations of Activin Receptor Like Kinase 1 Ectodomain

    PubMed Central

    Scotti, Claudia; Olivieri, Carla; Boeri, Laura; Canzonieri, Cecilia; Ornati, Federica; Buscarini, Elisabetta; Pagella, Fabio; Danesino, Cesare

    2011-01-01

    Activin A receptor, type II-like kinase 1 (also called ALK1), is a serine-threonine kinase predominantly expressed on endothelial cells surface. Mutations in its ACVRL1 encoding gene (12q11-14) cause type 2 Hereditary Haemorrhagic Telangiectasia (HHT2), an autosomal dominant multisystem vascular dysplasia. The study of the structural effects of mutations is crucial to understand their pathogenic mechanism. However, while an X-ray structure of ALK1 intracellular domain has recently become available (PDB ID: 3MY0), structure determination of ALK1 ectodomain (ALK1EC) has been elusive so far. We here describe the building of a homology model for ALK1EC, followed by an extensive bioinformatic analysis, based on a set of 38 methods, of the effect of missense mutations at the sequence and structural level. ALK1EC potential interaction mode with its ligand BMP9 was then predicted combining modelling and docking data. The calculated model of the ALK1EC allowed mapping and a preliminary characterization of HHT2 associated mutations. Major structural changes and loss of stability of the protein were predicted for several mutations, while others were found to interfere mainly with binding to BMP9 or other interactors, like Endoglin (CD105), whose encoding ENG gene (9q34) mutations are known to cause type 1 HHT. This study gives a preliminary insight into the potential structure of ALK1EC and into the structural effects of HHT2 associated mutations, which can be useful to predict the potential effect of each single mutation, to devise new biological experiments and to interpret the biological significance of new mutations, private mutations, or non-synonymous polymorphisms. PMID:22028876

  9. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  10. BioStar: an online question & answer resource for the bioinformatics community

    USDA-ARS?s Scientific Manuscript database

    Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...

  11. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community

    PubMed Central

    D'Elia, Domenica; Gisel, Andreas; Eriksson, Nils-Einar; Kossida, Sophia; Mattila, Kimmo; Klucar, Lubos; Bongcam-Rudloff, Erik

    2009-01-01

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in. PMID:19534734

  12. Web tools for predictive toxicology model building.

    PubMed

    Jeliazkova, Nina

    2012-07-01

    The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.

  13. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  14. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  15. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  16. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-03

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.

  17. Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles.

    PubMed

    Snyder, David A; Montelione, Gaetano T

    2005-06-01

    An important open question in the field of NMR-based biomolecular structure determination is how best to characterize the precision of the resulting ensemble of structures. Typically, the RMSD, as minimized in superimposing the ensemble of structures, is the preferred measure of precision. However, the presence of poorly determined atomic coordinates and multiple "RMSD-stable domains"--locally well-defined regions that are not aligned in global superimpositions--complicate RMSD calculations. In this paper, we present a method, based on a novel, structurally defined order parameter, for identifying a set of core atoms to use in determining superimpositions for RMSD calculations. In addition we present a method for deciding whether to partition that core atom set into "RMSD-stable domains" and, if so, how to determine partitioning of the core atom set. We demonstrate our algorithm and its application in calculating statistically sound RMSD values by applying it to a set of NMR-derived structural ensembles, superimposing each RMSD-stable domain (or the entire core atom set, where appropriate) found in each protein structure under consideration. A parameter calculated by our algorithm using a novel, kurtosis-based criterion, the epsilon-value, is a measure of precision of the superimposition that complements the RMSD. In addition, we compare our algorithm with previously described algorithms for determining core atom sets. The methods presented in this paper for biomolecular structure superimposition are quite general, and have application in many areas of structural bioinformatics and structural biology.

  18. Functional insights from proteome-wide structural modeling of Treponema pallidum subspecies pallidum, the causative agent of syphilis.

    PubMed

    Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E

    2018-05-16

    Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.

  19. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  20. A Portable Bioinformatics Course for Upper-Division Undergraduate Curriculum in Sciences

    ERIC Educational Resources Information Center

    Floraino, Wely B.

    2008-01-01

    This article discusses the challenges that bioinformatics education is facing and describes a bioinformatics course that is successfully taught at the California State Polytechnic University, Pomona, to the fourth year undergraduate students in biological sciences, chemistry, and computer science. Information on lecture and computer practice…

  1. Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…

  2. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  3. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  4. Green Fluorescent Protein-Focused Bioinformatics Laboratory Experiment Suitable for Undergraduates in Biochemistry Courses

    ERIC Educational Resources Information Center

    Rowe, Laura

    2017-01-01

    An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…

  5. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  6. A Summer Program Designed to Educate College Students for Careers in Bioinformatics

    ERIC Educational Resources Information Center

    Krilowicz, Beverly; Johnston, Wendie; Sharp, Sandra B.; Warter-Perez, Nancy; Momand, Jamil

    2007-01-01

    A summer program was created for undergraduates and graduate students that teaches bioinformatics concepts, offers skills in professional development, and provides research opportunities in academic and industrial institutions. We estimate that 34 of 38 graduates (89%) are in a career trajectory that will use bioinformatics. Evidence from…

  7. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  8. Teaching Bioinformatics and Neuroinformatics by Using Free Web-Based Tools

    ERIC Educational Resources Information Center

    Grisham, William; Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with…

  9. When cloud computing meets bioinformatics: a review.

    PubMed

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  10. 5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK--Setting the analysis frame.

    PubMed

    Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf

    2005-09-01

    The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.

  11. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  12. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  13. BIAS: Bioinformatics Integrated Application Software.

    PubMed

    Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z

    2005-04-15

    We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.

  14. Practical applications of the bioinformatics toolbox for narrowing quantitative trait loci.

    PubMed

    Burgess-Herbert, Sarah L; Cox, Allison; Tsaih, Shirng-Wern; Paigen, Beverly

    2008-12-01

    Dissecting the genes involved in complex traits can be confounded by multiple factors, including extensive epistatic interactions among genes, the involvement of epigenetic regulators, and the variable expressivity of traits. Although quantitative trait locus (QTL) analysis has been a powerful tool for localizing the chromosomal regions underlying complex traits, systematically identifying the causal genes remains challenging. Here, through its application to plasma levels of high-density lipoprotein cholesterol (HDL) in mice, we demonstrate a strategy for narrowing QTL that utilizes comparative genomics and bioinformatics techniques. We show how QTL detected in multiple crosses are subjected to both combined cross analysis and haplotype block analysis; how QTL from one species are mapped to the concordant regions in another species; and how genomewide scans associating haplotype groups with their phenotypes can be used to prioritize the narrowed regions. Then we illustrate how these individual methods for narrowing QTL can be systematically integrated for mouse chromosomes 12 and 15, resulting in a significantly reduced number of candidate genes, often from hundreds to <10. Finally, we give an example of how additional bioinformatics resources can be combined with experiments to determine the most likely quantitative trait genes.

  15. A comparison of common programming languages used in bioinformatics

    PubMed Central

    Fourment, Mathieu; Gillings, Michael R

    2008-01-01

    Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993

  16. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  17. A new method to improve network topological similarity search: applied to fold recognition

    PubMed Central

    Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei

    2015-01-01

    Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198

  18. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

    PubMed Central

    2012-01-01

    Background Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. Results In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Conclusions Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications. PMID:22369626

  19. Composition-related structural transition of random peptides: insight into the boundary between intrinsically disordered proteins and folded proteins.

    PubMed

    Kang, Wen-Bin; He, Chuan; Liu, Zhen-Xing; Wang, Jun; Wang, Wei

    2018-05-16

    Previous studies based on bioinformatics showed that there is a sharp distinction of structural features and residue composition between the intrinsically disordered proteins and the folded proteins. What induces such a composition-related structural transition? How do various kinds of interactions work in such processes? In this work, we investigate these problems based on a survey on peptides randomly composed of charged residues (including glutamic acids and lysines) and the residues with different hydrophobicity, such as alanines, glycines, or phenylalanines. Based on simulations using all-atom model and replica-exchange Monte Carlo method, a coil-globule transition is observed for each peptide. The corresponding transition temperature is found to be dependent on the contents of the hydrophobic and charged residues. For several cases, when the mean hydrophobicity is larger than a certain threshold, the transition temperature is higher than the room temperature, and vise versa. These thresholds of hydrophobicity and net charge are quantitatively consistent with the border line observed from the study of bioinformatics. These results outline the basic physical reasons for the compositional distinction between the intrinsically disordered proteins and the folded proteins. Furthermore, the contributions of various interactions to the structural variation of peptides are analyzed based on the contact statistics and the charge-pattern dependence of the gyration radii of the peptides. Our observations imply that the hydrophobicity contributes essentially to such composition-related transitions. Thus, we achieve a better understanding on composition-structure relation of the natural proteins and the underlying physics.

  20. Bioinformatics analyses of Shigella CRISPR structure and spacer classification.

    PubMed

    Wang, Pengfei; Zhang, Bing; Duan, Guangcai; Wang, Yingfang; Hong, Lijuan; Wang, Linlin; Guo, Xiangjiao; Xi, Yuanlin; Yang, Haiyan

    2016-03-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are inheritable genetic elements of a variety of archaea and bacteria and indicative of the bacterial ecological adaptation, conferring acquired immunity against invading foreign nucleic acids. Shigella is an important pathogen for anthroponosis. This study aimed to analyze the features of Shigella CRISPR structure and classify the spacers through bioinformatics approach. Among 107 Shigella, 434 CRISPR structure loci were identified with two to seven loci in different strains. CRISPR-Q1, CRISPR-Q4 and CRISPR-Q5 were widely distributed in Shigella strains. Comparison of the first and last repeats of CRISPR1, CRISPR2 and CRISPR3 revealed several base variants and different stem-loop structures. A total of 259 cas genes were found among these 107 Shigella strains. The cas gene deletions were discovered in 88 strains. However, there is one strain that does not contain cas gene. Intact clusters of cas genes were found in 19 strains. From comprehensive analysis of sequence signature and BLAST and CRISPRTarget score, the 708 spacers were classified into three subtypes: Type I, Type II and Type III. Of them, Type I spacer referred to those linked with one gene segment, Type II spacer linked with two or more different gene segments, and Type III spacer undefined. This study examined the diversity of CRISPR/cas system in Shigella strains, demonstrated the main features of CRISPR structure and spacer classification, which provided critical information for elucidation of the mechanisms of spacer formation and exploration of the role the spacers play in the function of the CRISPR/cas system.

  1. A Novel Partial Sequence Alignment Tool for Finding Large Deletions

    PubMed Central

    Aruk, Taner; Ustek, Duran; Kursun, Olcay

    2012-01-01

    Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777

  2. Vision & Strategy: Predictive Ecotoxicology in the 21st Century

    DTIC Science & Technology

    2011-01-01

    their relative abundance or modifications QSARs —Correlation of ecological or toxicological activity with chemical structure to understand or predict...data and collection methods. The dramatic increase in the amount of toxicological data we can collect and analyze is complemented by our improved...diverse disciplines such as biochemistry, ecology, molecular biology, toxicology , bioinformatics, and health and environmental risk assess- ment

  3. Detection of the "cp4 epsps" Gene in Maize Line NK603 and Comparison of Related Protein Structures: An Advanced Undergraduate Experiment

    ERIC Educational Resources Information Center

    Swope, Nicole K.; Fryfogle, Patrick J.; Sivy, Tami L.

    2015-01-01

    A flexible, rigorous laboratory experiment for upper-level biochemistry undergraduates is described that focuses on the Roundup Ready maize line. The work is appropriate for undergraduate laboratory courses that integrate biochemistry, molecular biology, or bioinformatics. In this experiment, DNA is extracted and purified from maize kernel and…

  4. New horizons in Biophysics

    PubMed Central

    2011-01-01

    This editorial celebrates the re-launch of PMC Biophysics previously published by PhysMath Central, in its new format as BMC Biophysics published by BioMed Central with an expanded scope and Editorial Board. BMC Biophysics will fill its own niche in the BMC series alongside complementary companion journals including BMC Bioinformatics, BMC Medical Physics, BMC Structural Biology and BMC Systems Biology. PMID:21595996

  5. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    PubMed

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.

  6. Structural elucidation and molecular characterization of Marinobacter sp. α-amylase.

    PubMed

    Kumar, Sumit; Khan, Rizwan Hasan; Khare, S K

    2016-01-01

    Halophiles have been perceived as potential source of novel enzymes in recent years. The interest emanates from their ability to catalyze efficiently under high salt and organic solvents. Marinobacter sp. EMB8 α-amylase was found to be active and stable in salt and organic solvents. A study was carried out using circular dichroism (CD), fluorescence spectroscopy, and bioinformatics analysis of similar protein sequence to ascertain molecular basis of salt and solvent adaptability of α-amylase. Structural changes recorded in the presence of varying amounts of NaCl exhibited an increase in negative ellipticity as a function of salt, confirming that salt stabilizes the protein and increases the secondary structure, making it catalytically functional. The data of intrinsic and extrinsic fluorescence (using 1-anilinonaphthalene 8-sulfonate [ANS] as probe) further confirmed the role of salt. The α-amylase was active in the presence of nonpolar solvents, namely, hexane and decane, but inactivated by ethanol. The decrease in the activity was correlated with the loss of tertiary structure in the presence of ethanol. Guanidine hydrochloride and pH denaturation indicated the molten globule state at pH 4.0. Partial N-terminal amino acid sequence of the purified α-amylase revealed the relatedness to Pseudoalteromonas sp. α-amylase. "FVHLFEW" was found as the N-terminal signature sequence. Bioinformatics analysis was done using M. algicola α-amylase protein having the same N-terminal signature sequence. The three-dimensional structure of Marinobacter α-amylase was deduced using the I-TASSER server, which reflected the enrichment of acidic amino acids on the surface, imparting the stability in the presence of salt. Our study clearly indicate that salt is necessary for maintaining the secondary and tertiary structure of halophilic protein, which is a necessary prerequisite for catalysis.

  7. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    ERIC Educational Resources Information Center

    Wefer, Stephen H.; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics…

  8. Implementing a Web-Based Introductory Bioinformatics Course for Non-Bioinformaticians That Incorporates Practical Exercises

    ERIC Educational Resources Information Center

    Vincent, Antony T.; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M.; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J.; Lagüe, Patrick

    2018-01-01

    A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become…

  9. Vignettes: diverse library staff offering diverse bioinformatics services*

    PubMed Central

    Osterbur, David L.; Alpi, Kristine; Canevari, Catharine; Corley, Pamela M.; Devare, Medha; Gaedeke, Nicola; Jacobs, Donna K.; Kirlew, Peter; Ohles, Janet A.; Vaughan, K.T.L.; Wang, Lili; Wu, Yongchun; Geer, Renata C.

    2006-01-01

    Objectives: The paper gives examples of the bioinformatics services provided in a variety of different libraries by librarians with a broad range of educational background and training. Methods: Two investigators sent an email inquiry to attendees of the “National Center for Biotechnology Information's (NCBI) Introduction to Molecular Biology Information Resources” or “NCBI Advanced Workshop for Bioinformatics Information Specialists (NAWBIS)” courses. The thirty-five-item questionnaire addressed areas such as educational background, library setting, types and numbers of users served, and bioinformatics training and support services provided. Answers were compiled into program vignettes. Discussion: The bioinformatics support services addressed in the paper are based in libraries with academic and clinical settings. Services have been established through different means: in collaboration with biology faculty as part of formal courses, through teaching workshops in the library, through one-on-one consultations, and by other methods. Librarians with backgrounds from art history to doctoral degrees in genetics have worked to establish these programs. Conclusion: Successful bioinformatics support programs can be established in libraries in a variety of different settings and by staff with a variety of different backgrounds and approaches. PMID:16888664

  10. Vertical and horizontal integration of bioinformatics education: A modular, interdisciplinary approach.

    PubMed

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D Blaine; Langeland, James A

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option. Furthermore, we believe that a true interdisciplinary science experience would be best served by introduction of bioinformatics modules within existing courses in biology and chemistry and other complementary departments. To that end, with support from the Howard Hughes Medical Institute, we have developed over a dozen independent bioinformatics modules for our students that are incorporated into courses ranging from general chemistry and biology, advanced specialty courses, and classes in complementary disciplines such as computer science, mathematics, and physics. These activities have largely promoted active learning in our classrooms and have enhanced student understanding of course materials. Herein, we describe our program, the activities we have developed, and assessment of our endeavors in this area. Copyright © 2009 International Union of Biochemistry and Molecular Biology, Inc.

  11. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  12. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

    PubMed

    Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy

    2007-01-01

    To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).

  13. Carving a niche: establishing bioinformatics collaborations

    PubMed Central

    Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.

    2006-01-01

    Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668

  14. PLAZA 3.0: an access point for plant comparative genomics

    PubMed Central

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. PMID:25324309

  15. Cys Site-Directed Mutagenesis of the Human SLC1A5 (ASCT2) Transporter: Structure/Function Relationships and Crucial Role of Cys467 for Redox Sensing and Glutamine Transport

    PubMed Central

    Scalise, Mariafrancesca; Pochini, Lorena; Console, Lara; Pappacoda, Gilda; Pingitore, Piero; Hedfalk, Kristina; Indiveri, Cesare

    2018-01-01

    The human plasma membrane transporter ASCT2 is responsible for mediating Na- dependent antiport of neutral amino acids. New insights into structure/function relationships were unveiled by a combined approach of recombinant over-expression, site-directed mutagenesis, transport assays in proteoliposomes and bioinformatics. WT and Cys mutants of hASCT2 were produced in P. pastoris and purified for functional assay. The reactivity towards SH reducing and oxidizing agents of WT protein was investigated and opposite effects were revealed; transport activity increased upon treatment with the Cys reducing agent DTE, i.e., when Cys residues were in thiol (reduced) state. Methyl-Hg, which binds to SH groups, was able to inhibit WT and seven out of eight Cys to Ala mutants. On the contrary, C467A loses the sensitivity to both DTE activation and Methyl-Hg inhibition. The C467A mutant showed a Km for Gln one order of magnitude higher than that of WT. Moreover, the C467 residue is localized in the substrate binding region of the protein, as suggested by bioinformatics on the basis of the EAAT1 structure comparison. Taken together, the experimental data allowed identifying C467 residue as crucial for substrate binding and for transport activity modulation of hASCT2. PMID:29495336

  16. Structural and Phylogenetic Analysis of Laccases from Trichoderma: A Bioinformatic Approach

    PubMed Central

    Cázares-García, Saila Viridiana; Vázquez-Garcidueñas, Ma. Soledad; Vázquez-Marrufo, Gerardo

    2013-01-01

    The genus Trichoderma includes species of great biotechnological value, both for their mycoparasitic activities and for their ability to produce extracellular hydrolytic enzymes. Although activity of extracellular laccase has previously been reported in Trichoderma spp., the possible number of isoenzymes is still unknown, as are the structural and functional characteristics of both the genes and the putative proteins. In this study, the system of laccases sensu stricto in the Trichoderma species, the genomes of which are publicly available, were analyzed using bioinformatic tools. The intron/exon structure of the genes and the identification of specific motifs in the sequence of amino acids of the proteins generated in silico allow for clear differentiation between extracellular and intracellular enzymes. Phylogenetic analysis suggests that the common ancestor of the genus possessed a functional gene for each one of these enzymes, which is a characteristic preserved in T. atroviride and T. virens. This analysis also reveals that T. harzianum and T. reesei only retained the intracellular activity, whereas T. asperellum added an extracellular isoenzyme acquired through horizontal gene transfer during the mycoparasitic process. The evolutionary analysis shows that in general, extracellular laccases are subjected to purifying selection, and intracellular laccases show neutral evolution. The data provided by the present study will enable the generation of experimental approximations to better understand the physiological role of laccases in the genus Trichoderma and to increase their biotechnological potential. PMID:23383142

  17. Confirmation of the pathogenicity of a mutation p.G337C in the COL1A2 gene associated with osteogenesis imperfecta

    PubMed Central

    Jia, Mingrui; Shi, Ranran; Zhao, Xuli; Fu, Zhijian; Bai, Zhijing; Sun, Tao; Zhao, Xuejun; Wang, Wenbo; Xu, Chao; Yan, Fang

    2017-01-01

    Abstract Mutation analysis as the gold standard is particularly important in diagnosis of osteogenesis imperfecta (OI) and it may be preventable upon early diagnosis. In this study, we aimed to analyze the clinical and genetic materials of an OI pedigree as well as to confirm the deleterious property of the mutation. A pedigree with OI was identified. All family members received careful clinical examinations and blood was drawn for genetic analyses. Genes implicated in OI were screened for mutation. The function and structure of the mutant protein were predicted using bioinformatics analysis. The proband, a 9-month fetus, showed abnormal sonographic images. Disproportionately short and triangular face with blue sclera was noticed at birth. She can barely walk and suffered multiple fractures till 2-year old. Her mother appeared small stature, frequent fractures, blue sclera, and deformity of extremities. A heterozygous missense mutation c.1009G>T (p.G337C) in the COL1A2 gene was identified in her mother and her. Bioinformatics analysis showed p.G337 was well-conserved among multiple species and the mutation probably changed the structure and damaged the function of collagen. We suggest that the mutation p.G337C in the COL1A2 gene is pathogenic for OI by affecting the protein structure and the function of collagen. PMID:28953610

  18. Development of a cloud-based Bioinformatics Training Platform.

    PubMed

    Revote, Jerico; Watson-Haigh, Nathan S; Quenette, Steve; Bethwaite, Blair; McGrath, Annette; Shang, Catherine A

    2017-05-01

    The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. © The Author 2016. Published by Oxford University Press.

  19. Development of a cloud-based Bioinformatics Training Platform

    PubMed Central

    Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette

    2017-01-01

    Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333

  20. The Methionine-aromatic Motif Plays a Unique Role in Stabilizing Protein Structure*

    PubMed Central

    Valley, Christopher C.; Cembran, Alessandro; Perlmutter, Jason D.; Lewis, Andrew K.; Labello, Nicholas P.; Gao, Jiali; Sachs, Jonathan N.

    2012-01-01

    Of the 20 amino acids, the precise function of methionine (Met) remains among the least well understood. To establish a determining characteristic of methionine that fundamentally differentiates it from purely hydrophobic residues, we have used in vitro cellular experiments, molecular simulations, quantum calculations, and a bioinformatics screen of the Protein Data Bank. We show that approximately one-third of all known protein structures contain an energetically stabilizing Met-aromatic motif and, remarkably, that greater than 10,000 structures contain this motif more than 10 times. Critically, we show that as compared with a purely hydrophobic interaction, the Met-aromatic motif yields an additional stabilization of 1–1.5 kcal/mol. To highlight its importance and to dissect the energetic underpinnings of this motif, we have studied two clinically relevant TNF ligand-receptor complexes, namely TRAIL-DR5 and LTα-TNFR1. In both cases, we show that the motif is necessary for high affinity ligand binding as well as function. Additionally, we highlight previously overlooked instances of the motif in several disease-related Met mutations. Our results strongly suggest that the Met-aromatic motif should be exploited in the rational design of therapeutics targeting a range of proteins. PMID:22859300

  1. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

    PubMed Central

    Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

    2016-01-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418

  2. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

    PubMed

    Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

    2016-09-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

  3. An ontology-based framework for bioinformatics workflows.

    PubMed

    Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

    2007-01-01

    The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.

  4. MAPI: towards the integrated exploitation of bioinformatics Web Services.

    PubMed

    Ramirez, Sergio; Karlsson, Johan; Trelles, Oswaldo

    2011-10-27

    Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).

  5. Discovery of novel bacterial toxins by genomics and computational biology.

    PubMed

    Doxey, Andrew C; Mansfield, Michael J; Montecucco, Cesare

    2018-06-01

    Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes. Copyright © 2018. Published by Elsevier Ltd.

  6. Agonist Binding to Chemosensory Receptors: A Systematic Bioinformatics Analysis

    PubMed Central

    Fierro, Fabrizio; Suku, Eda; Alfonso-Prieto, Mercedes; Giorgetti, Alejandro; Cichon, Sven; Carloni, Paolo

    2017-01-01

    Human G-protein coupled receptors (hGPCRs) constitute a large and highly pharmaceutically relevant membrane receptor superfamily. About half of the hGPCRs' family members are chemosensory receptors, involved in bitter taste and olfaction, along with a variety of other physiological processes. Hence these receptors constitute promising targets for pharmaceutical intervention. Molecular modeling has been so far the most important tool to get insights on agonist binding and receptor activation. Here we investigate both aspects by bioinformatics-based predictions across all bitter taste and odorant receptors for which site-directed mutagenesis data are available. First, we observe that state-of-the-art homology modeling combined with previously used docking procedures turned out to reproduce only a limited fraction of ligand/receptor interactions inferred by experiments. This is most probably caused by the low sequence identity with available structural templates, which limits the accuracy of the protein model and in particular of the side-chains' orientations. Methods which transcend the limited sampling of the conformational space of docking may improve the predictions. As an example corroborating this, we review here multi-scale simulations from our lab and show that, for the three complexes studied so far, they significantly enhance the predictive power of the computational approach. Second, our bioinformatics analysis provides support to previous claims that several residues, including those at positions 1.50, 2.50, and 7.52, are involved in receptor activation. PMID:28932739

  7. SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.

    PubMed

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-03-08

    The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.

  8. SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

    PubMed Central

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-01-01

    Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562

  9. The implementation of e-learning tools to enhance undergraduate bioinformatics teaching and learning: a case study in the National University of Singapore

    PubMed Central

    2009-01-01

    Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511

  10. Parallel protein secondary structure prediction based on neural networks.

    PubMed

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  11. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

    PubMed Central

    Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin

    2015-01-01

    Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435

  12. Integration of Bioinformatics into an Undergraduate Biology Curriculum and the Impact on Development of Mathematical Skills

    ERIC Educational Resources Information Center

    Wightman, Bruce; Hark, Amy T.

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this…

  13. Bioinformatics in Middle East Program Curricula--A Focus on the Arabian Gulf

    ERIC Educational Resources Information Center

    Loucif, Samia

    2014-01-01

    The purpose of this paper is to investigate the inclusion of bioinformatics in program curricula in the Middle East, focusing on educational institutions in the Arabian Gulf. Bioinformatics is a multidisciplinary field which has emerged in response to the need for efficient data storage and retrieval, and accurate and fast computational and…

  14. Making Bioinformatics Projects a Meaningful Experience in an Undergraduate Biotechnology or Biomedical Science Programme

    ERIC Educational Resources Information Center

    Sutcliffe, Iain C.; Cummings, Stephen P.

    2007-01-01

    Bioinformatics has emerged as an important discipline within the biological sciences that allows scientists to decipher and manage the vast quantities of data (such as genome sequences) that are now available. Consequently, there is an obvious need to provide graduates in biosciences with generic, transferable skills in bioinformatics. We present…

  15. Evaluating the Effectiveness of a Practical Inquiry-Based Learning Bioinformatics Module on Undergraduate Student Engagement and Applied Skills

    ERIC Educational Resources Information Center

    Brown, James A. L.

    2016-01-01

    A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion,…

  16. The S-Star Trial Bioinformatics Course: An On-line Learning Success

    ERIC Educational Resources Information Center

    Lim, Yun Ping; Hoog, Jan-Olov; Gardner, Phyllis; Ranganathan, Shoba; Andersson, Siv; Subbiah, Subramanian; Tan, Tin Wee; Hide, Winston; Weiss, Anthony S.

    2003-01-01

    The S-Star Trial Bioinformatics on-line course (www.s-star.org) is a global experiment in bioinformatics distance education. Six universities from five continents have participated in this project. One hundred and fifty students participated in the first trial course of which 96 followed through the entire course and 70 fulfilled the overall…

  17. Prediction of beta-turns and beta-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN).

    PubMed

    Kirschner, Andreas; Frishman, Dmitrij

    2008-10-01

    Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. http://webclu.bio.wzw.tum.de/predator-web/.

  18. Iterative refinement of structure-based sequence alignments by Seed Extension

    PubMed Central

    Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook

    2009-01-01

    Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133

  19. Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.

    PubMed

    Ortega, Davi R; Zhulin, Igor B

    2018-01-01

    Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.

  20. Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis

    USDA-ARS?s Scientific Manuscript database

    Understanding the molecular and genetic mechanisms underlying variation in seed composition and contents among different genotypes is important for soybean oil quality improvement. We designed a bioinformatics approach to compare seed transcriptomes of 9 soybean genotypes varying in oil composition ...

  1. BiofOmics: a Web platform for the systematic and standardized collection of high-throughput biofilm data.

    PubMed

    Lourenço, Anália; Ferreira, Andreia; Veiga, Nuno; Machado, Idalina; Pereira, Maria Olivia; Azevedo, Nuno F

    2012-01-01

    Consortia of microorganisms, commonly known as biofilms, are attracting much attention from the scientific community due to their impact in human activity. As biofilm research grows to be a data-intensive discipline, the need for suitable bioinformatics approaches becomes compelling to manage and validate individual experiments, and also execute inter-laboratory large-scale comparisons. However, biofilm data is widespread across ad hoc, non-standardized individual files and, thus, data interchange among researchers, or any attempt of cross-laboratory experimentation or analysis, is hardly possible or even attempted. This paper presents BiofOmics, the first publicly accessible Web platform specialized in the management and analysis of data derived from biofilm high-throughput studies. The aim is to promote data interchange across laboratories, implementing collaborative experiments, and enable the development of bioinformatics tools in support of the processing and analysis of the increasing volumes of experimental biofilm data that are being generated. BiofOmics' data deposition facility enforces data structuring and standardization, supported by controlled vocabulary. Researchers are responsible for the description of the experiments, their results and conclusions. BiofOmics' curators interact with submitters only to enforce data structuring and the use of controlled vocabulary. Then, BiofOmics' search facility makes publicly available the profile and data associated with a submitted study so that any researcher can profit from these standardization efforts to compare similar studies, generate new hypotheses to be tested or even extend the conditions experimented in the study. BiofOmics' novelty lies in its support to standardized data deposition, the availability of computerizable data files and the free-of-charge dissemination of biofilm studies across the community. Hopefully, this will open promising research possibilities, namely the comparison of results between different laboratories, the reproducibility of methods within and between laboratories, and the development of guidelines and standardized protocols for biofilm formation operating procedures and analytical methods.

  2. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.

    PubMed

    Hallin, Peter F; Ussery, David W

    2004-12-12

    Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.

  3. Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

    PubMed Central

    Diaz Acosta, B.

    2011-01-01

    The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.

  4. Bioinformatics in high school biology curricula: a study of state science standards.

    PubMed

    Wefer, Stephen H; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.

  5. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    PubMed

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  6. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    PubMed Central

    Cheng, Gong; Zhang, Guocai; Xu, Liang

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

  7. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  8. [Integration of clinical and biological data in clinical practice using bioinformatics].

    PubMed

    Coltell, Oscar; Arregui, María; Fabregat, Antonio; Portolés, Olga

    2008-05-01

    The aim of our work is to describe essential aspects of Medical Informatics, Bioinformatics and Biomedical Informatics, that are used in biomedical research and clinical practice. These disciplines have emerged from the need to find new scientific and technical approaches to manage, store, analyze and report data generated in clinical practice and molecular biology and other medical specialties. It can be also useful to integrate research information generated in different areas of health care. Moreover, these disciplines are interdisciplinary and integrative, two key features not shared by other areas of medical knowledge. Finally, when Bioinformatics and Biomedical Informatics approach to medical investigation and practice are applied, a new discipline, called Clinical Bioinformatics, emerges. The latter requires a specific training program to create a new professional profile. We have not been able to find a specific training program in Clinical Bioinformatics in Spain.

  9. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    PubMed Central

    Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students. PMID:18316818

  10. MACBenAbim: A Multi-platform Mobile Application for searching keyterms in Computational Biology and Bioinformatics.

    PubMed

    Oluwagbemi, Olugbenga O; Adewumi, Adewole; Esuruoso, Abimbola

    2012-01-01

    Computational biology and bioinformatics are gradually gaining grounds in Africa and other developing nations of the world. However, in these countries, some of the challenges of computational biology and bioinformatics education are inadequate infrastructures, and lack of readily-available complementary and motivational tools to support learning as well as research. This has lowered the morale of many promising undergraduates, postgraduates and researchers from aspiring to undertake future study in these fields. In this paper, we developed and described MACBenAbim (Multi-platform Mobile Application for Computational Biology and Bioinformatics), a flexible user-friendly tool to search for, define and describe the meanings of keyterms in computational biology and bioinformatics, thus expanding the frontiers of knowledge of the users. This tool also has the capability of achieving visualization of results on a mobile multi-platform context. MACBenAbim is available from the authors for non-commercial purposes.

  11. Chloroplast genomes of Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea: Structures and comparative analysis.

    PubMed

    Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung

    2017-08-08

    We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.

  12. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  13. Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

    PubMed

    Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia

    2009-06-16

    A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.

  14. Rational engineering of the Neurospora VS ribozyme to allow substrate recognition via different kissing-loop interactions.

    PubMed

    Lacroix-Labonté, Julie; Girard, Nicolas; Dagenais, Pierre; Legault, Pascale

    2016-08-19

    The Neurospora VS ribozyme is a catalytic RNA that has the unique ability to specifically recognize and cleave a stem-loop substrate through formation of a highly stable kissing-loop interaction (KLI). In order to explore the engineering potential of the VS ribozyme to cleave alternate substrates, we substituted the wild-type KLI by other known KLIs using an innovative engineering method that combines rational and combinatorial approaches. A bioinformatic search of the protein data bank was initially performed to identify KLIs that are structurally similar to the one found in the VS ribozyme. Next, substrate/ribozyme (S/R) pairs that incorporate these alternative KLIs were kinetically and structurally characterized. Interestingly, several of the resulting S/R pairs allowed substrate cleavage with substantial catalytic efficiency, although with reduced activity compared to the reference S/R pair. Overall, this study describes an innovative approach for RNA engineering and establishes that the KLI of the trans VS ribozyme can be adapted to cleave other folded RNA substrates. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Delta: a new web-based 3D genome visualization and analysis platform.

    PubMed

    Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua

    2018-04-15

    Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.

  16. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  17. University-Level Practical Activities in Bioinformatics Benefit Voluntary Groups of Pupils in the Last 2 Years of School

    ERIC Educational Resources Information Center

    Barker, Daniel; Alderson, Rosanna G.; McDonagh, James L.; Plaisier, Heleen; Comrie, Muriel M.; Duncan, Leigh; Muirhead, Gavin T. P.; Sweeney, Stuart D.

    2015-01-01

    Background: Bioinformatics--the use of computers in biology--is of major and increasing importance to biological sciences and medicine. We conducted a preliminary investigation of the value of bringing practical, university-level bioinformatics education to the school level. We conducted voluntary activities for pupils at two schools in Scotland…

  18. The Air Force In Silico -- Computational Biology in 2025

    DTIC Science & Technology

    2007-11-01

    and chromosome) these new fields are commonly referred to as “~omics.” Proteomics , transcriptomics, metabolomics , epigenomics, physiomics... Bioinformatics , 2006, http://journal.imbio.de/ http://www-bm.ipk-gatersleben.de/stable/php/ journal /articles/pdf/jib-22.pdf (accessed 30 September...Chirino, G. Tansley and I. Dryden, “The implications for Bioinformatics of integration across physical scales,” Journal of Integrative Bioinformatics

  19. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  20. Biotool2Web: creating simple Web interfaces for bioinformatics applications.

    PubMed

    Shahid, Mohammad; Alam, Intikhab; Fuellen, Georg

    2006-01-01

    Currently there are many bioinformatics applications being developed, but there is no easy way to publish them on the World Wide Web. We have developed a Perl script, called Biotool2Web, which makes the task of creating web interfaces for simple ('home-made') bioinformatics applications quick and easy. Biotool2Web uses an XML document containing the parameters to run the tool on the Web, and generates the corresponding HTML and common gateway interface (CGI) files ready to be published on a web server. This tool is available for download at URL http://www.uni-muenster.de/Bioinformatics/services/biotool2web/ Georg Fuellen (fuellen@alum.mit.edu).

Top