Science.gov

Sample records for comparative structural bioinformatics

  1. A Comparative Structural Bioinformatics Analysis of the Insulin Receptor Family Ectodomain Based on Phylogenetic Information

    PubMed Central

    Rentería, Miguel E.; Gandhi, Neha S.; Vinuesa, Pablo; Helmerhorst, Erik; Mancera, Ricardo L.

    2008-01-01

    The insulin receptor (IR), the insulin-like growth factor 1 receptor (IGF1R) and the insulin receptor-related receptor (IRR) are covalently-linked homodimers made up of several structural domains. The molecular mechanism of ligand binding to the ectodomain of these receptors and the resulting activation of their tyrosine kinase domain is still not well understood. We have carried out an amino acid residue conservation analysis in order to reconstruct the phylogeny of the IR Family. We have confirmed the location of ligand binding site 1 of the IGF1R and IR. Importantly, we have also predicted the likely location of the insulin binding site 2 on the surface of the fibronectin type III domains of the IR. An evolutionary conserved surface on the second leucine-rich domain that may interact with the ligand could not be detected. We suggest a possible mechanical trigger of the activation of the IR that involves a slight ‘twist’ rotation of the last two fibronectin type III domains in order to face the likely location of insulin. Finally, a strong selective pressure was found amongst the IRR orthologous sequences, suggesting that this orphan receptor has a yet unknown physiological role which may be conserved from amphibians to mammals. PMID:18989367

  2. Comparative void-volume analysis of psychrophilic and mesophilic enzymes: Structural bioinformatics of psychrophilic enzymes reveals sources of core flexibility.

    PubMed

    Paredes, Diana I; Watters, Kyle; Pitman, Derek J; Bystroff, Christopher; Dordick, Jonathan S

    2011-10-20

    Psychrophiles, cold-adapted organisms, have adapted to live at low temperatures by using a variety of mechanisms. Their enzymes are active at cold temperatures by being structurally more flexible than mesophilic enzymes. Even though, there are some indications of the possible structural mechanisms by which psychrophilic enzymes are catalytic active at cold temperatures, there is not a generalized structural property common to all psychrophilic enzymes. We examine twenty homologous enzyme pairs from psychrophiles and mesophiles to investigate flexibility as a key characteristic for cold adaptation. B-factors in protein X-ray structures are one way to measure flexibility. Comparing psychrophilic to mesophilic protein B-factors reveals that psychrophilic enzymes are more flexible in 5-turn and strand secondary structures. Enzyme cavities, identified using CASTp at various probe sizes, indicate that psychrophilic enzymes have larger average cavity sizes at probe radii of 1.4-1.5 Å, sufficient for water molecules. Furthermore, amino acid side chains lining these cavities show an increased frequency of acidic groups in psychrophilic enzymes. These findings suggest that embedded water molecules may play a significant role in cavity flexibility, and therefore, overall protein flexibility. Thus, our results point to the important role enzyme flexibility plays in adaptation to cold environments.

  3. ballaxy: web services for structural bioinformatics.

    PubMed

    Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

    2015-01-01

    Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling.  ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For

  4. The structural bioinformatics library: modeling in biomolecular science and beyond.

    PubMed

    Cazals, Frédéric; Dreyfus, Tom

    2017-04-01

    Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. The SBL is available from http://sbl.inria.fr. Frederic.Cazals@inria.fr. Supplementary data are available at Bioinformatics online.

  5. WIWS: a protein structure bioinformatics Web service collection

    PubMed Central

    Hekkelman, M. L.; te Beek, T. A. H.; Pettifer, S. R.; Thorne, D.; Attwood, T. K.; Vriend, G.

    2010-01-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily. PMID:20501602

  6. Teaching Structural Bioinformatics at the Undergraduate Level

    ERIC Educational Resources Information Center

    Centeno, Nuria B.; Villa-Freixa, Jordi; Oliva, Baldomero

    2003-01-01

    Understanding the basic principles of structural biology is becoming a major subject of study in most undergraduate level programs in biology. In the genomic and proteomic age, it is becoming indispensable for biology students to master concepts related to the sequence and structure of proteins in order to develop skills that may be useful in a…

  7. Teaching Structural Bioinformatics at the Undergraduate Level

    ERIC Educational Resources Information Center

    Centeno, Nuria B.; Villa-Freixa, Jordi; Oliva, Baldomero

    2003-01-01

    Understanding the basic principles of structural biology is becoming a major subject of study in most undergraduate level programs in biology. In the genomic and proteomic age, it is becoming indispensable for biology students to master concepts related to the sequence and structure of proteins in order to develop skills that may be useful in a…

  8. CSB: a Python framework for structural bioinformatics.

    PubMed

    Kalev, Ivan; Mechelke, Martin; Kopec, Klaus O; Holder, Thomas; Carstens, Simeon; Habeck, Michael

    2012-11-15

    Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project's website http://csb.codeplex.com. ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de

  9. Biskit--a software platform for structural bioinformatics.

    PubMed

    Grünberg, Raik; Nilges, Michael; Leckner, Johan

    2007-03-15

    Biskit is a modular, object-oriented python library that provides intuitive classes for many typical tasks of structural bioinformatics research. It facilitates the manipulation and analysis of macromolecular structures, protein complexes and molecular dynamics trajectories. At the same time, Biskit offers a software platform for the rapid integration of external programs and new algorithms into complex structural bioinformatics workflows. Calculations are thus often delegated to established programs like Xplor, Amber, Hex, Prosa, Hmmer and Modeller; interfaces to further software can be easily added. Moreover, Biskit simplifies the parallelization of time consuming calculations via PVM (Parallel Virtual Machine). The latest snapshot of Biskit, documentation and examples are freely available under the GNU General Public License at http://biskit.sf.net (alternate url http://biskit.pasteur.fr).

  10. A multi-species comparative structural bioinformatics analysis of inherited mutations in α-D-Mannosidase reveals strong genotype-phenotype correlation

    PubMed Central

    2009-01-01

    Background Lysosomal α-mannosidase is an enzyme that acts to degrade N-linked oligosaccharides and hence plays an important role in mannose metabolism in humans and other mammalian species, especially livestock. Mutations in the gene (MAN2B1) encoding lysosomal α-D-mannosidase cause improper coding, resulting in dysfunctional or non-functional protein, causing the disease α-mannosidosis. Mapping disease mutations to the structure of the protein can help in understanding the functional consequences of these mutations and thus indirectly, the finer aspects of the pathology and clinical manifestations of the disease, including phenotypic severity as a function of the genotype. Results A comprehensive homology modeling study of all the wild-type and inherited mutations of lysosomal α-mannosidase in four different species, human, cow, cat and guinea pig, reveals a significant correlation between the severity of the genotype and the phenotype in α-mannosidosis. We used the X-ray crystallographic structure of bovine lysosomal α-mannosidase as template, containing only two disulphide bonds and some ligands, to build structural models of wild-type structures with four disulfide linkages and all bound ligands. These wild-type models were then used as templates for disease mutations. All the truncations and substitutions involving the residues in and around the active site and those that destabilize the fold led to severe genotypes resulting in lethal phenotypes, whereas the mutations lying away from the active site were milder in both their genotypic and phenotypic expression. Conclusion Based on the co-location of mutations from different organisms and their proximity to the enzyme active site, we have extrapolated observed mutations from one species to homologous positions in other organisms, as a predictive approach for detecting likely α-mannosidosis. Besides predicting new disease mutations, this approach also provides a way for detecting mutation hotspots in the

  11. Observation selection bias in contact prediction and its implications for structural bioinformatics

    PubMed Central

    Orlando, G.; Raimondi, D.; Vranken, W. F.

    2016-01-01

    Next Generation Sequencing is dramatically increasing the number of known protein sequences, with related experimentally determined protein structures lagging behind. Structural bioinformatics is attempting to close this gap by developing approaches that predict structure-level characteristics for uncharacterized protein sequences, with most of the developed methods relying heavily on evolutionary information collected from homologous sequences. Here we show that there is a substantial observational selection bias in this approach: the predictions are validated on proteins with known structures from the PDB, but exactly for those proteins significantly more homologs are available compared to less studied sequences randomly extracted from Uniprot. Structural bioinformatics methods that were developed this way are thus likely to have over-estimated performances; we demonstrate this for two contact prediction methods, where performances drop up to 60% when taking into account a more realistic amount of evolutionary information. We provide a bias-free dataset for the validation for contact prediction methods called NOUMENON. PMID:27857150

  12. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  13. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    PubMed

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  14. FLAGdb(++): A Bioinformatic Environment to Study and Compare Plant Genomes.

    PubMed

    Tamby, Jean Philippe; Brunaud, Véronique

    2017-01-01

    Today, the growing knowledge and data accumulation on plant genomes do not solve in a simple way the task of gene function inference. Because data of different types are coming from various sources, we need to integrate and analyze them to help biologists in this task. We created FLAGdb(++) ( http://tools.ips2.u-psud.fr/FLAGdb ) to take up this challenge for a selection of plant genomes. In order to enrich gene function predictions, structural and functional annotations of the genomes are explored to generate meta-data and to compare them. Since data are numerous and complex, we focused on accessibility and visualization with an original and user-friendly interface. In this chapter we present the main tools of FLAGdb(++) and a use-case to explore a gene family: structural and functional properties of this family and research of orthologous genes in the other plant genomes.

  15. Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics.

    PubMed

    Vlachakis, Dimitrios; Pavlopoulou, Athanasia; Kazazi, Dorothea; Kossida, Sophia

    2013-10-10

    Microalgae are unicellular microorganisms indispensible for environmental stability and life on earth, because they produce approximately half of the atmospheric oxygen, with simultaneously feeding on the harmful greenhouse gas carbon dioxide. Using gene fusion analysis, a series of five fusion/fission events was identified, that provided the basis for critical insights to their evolutionary history. Moreover, the three-dimensional structures of both the fused and the component proteins were predicted, allowing us to envisage putative protein-protein interactions that are invaluable for the efficient usage, handling and exploitation of microalgae. Collectively, our proposed approach on the five fusion/fission alga protein events contributes towards the expansion of the microalgae knowledgebase, bridging protein evolution of the ancient microalgal species and the rapidly evolving, modern, bioinformatics field. © 2013 Elsevier B.V. All rights reserved.

  16. Achievements and challenges in structural bioinformatics and computational biophysics

    PubMed Central

    Samish, Ilan; Bourne, Philip E.; Najmanovich, Rafael J.

    2015-01-01

    Motivation: The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects. Results: An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results. Conclusion: The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field. Contact: Rafael.Najmanovich@USherbrooke.ca PMID:25488929

  17. Achievements and challenges in structural bioinformatics and computational biophysics.

    PubMed

    Samish, Ilan; Bourne, Philip E; Najmanovich, Rafael J

    2015-01-01

    The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects. An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results. The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field. © The Author 2014. Published by Oxford University Press.

  18. Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics

    ERIC Educational Resources Information Center

    Likic, Vladimir A.

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…

  19. Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics

    ERIC Educational Resources Information Center

    Likic, Vladimir A.

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…

  20. Bioinformatics analyses of Shigella CRISPR structure and spacer classification.

    PubMed

    Wang, Pengfei; Zhang, Bing; Duan, Guangcai; Wang, Yingfang; Hong, Lijuan; Wang, Linlin; Guo, Xiangjiao; Xi, Yuanlin; Yang, Haiyan

    2016-03-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are inheritable genetic elements of a variety of archaea and bacteria and indicative of the bacterial ecological adaptation, conferring acquired immunity against invading foreign nucleic acids. Shigella is an important pathogen for anthroponosis. This study aimed to analyze the features of Shigella CRISPR structure and classify the spacers through bioinformatics approach. Among 107 Shigella, 434 CRISPR structure loci were identified with two to seven loci in different strains. CRISPR-Q1, CRISPR-Q4 and CRISPR-Q5 were widely distributed in Shigella strains. Comparison of the first and last repeats of CRISPR1, CRISPR2 and CRISPR3 revealed several base variants and different stem-loop structures. A total of 259 cas genes were found among these 107 Shigella strains. The cas gene deletions were discovered in 88 strains. However, there is one strain that does not contain cas gene. Intact clusters of cas genes were found in 19 strains. From comprehensive analysis of sequence signature and BLAST and CRISPRTarget score, the 708 spacers were classified into three subtypes: Type I, Type II and Type III. Of them, Type I spacer referred to those linked with one gene segment, Type II spacer linked with two or more different gene segments, and Type III spacer undefined. This study examined the diversity of CRISPR/cas system in Shigella strains, demonstrated the main features of CRISPR structure and spacer classification, which provided critical information for elucidation of the mechanisms of spacer formation and exploration of the role the spacers play in the function of the CRISPR/cas system.

  1. Recent progress on structural bioinformatics research of cytochrome P450 and its impact on drug discovery.

    PubMed

    Zhang, Tao; Wei, Dongqing

    2015-01-01

    Cytochrome P450 is predominantly responsible for human drug metabolism, which is of critical importance for drug discovery and development. Structural bioinformatics focuses on analysis and prediction of three-dimentional structure of biological macromolecules and elucidation of structure-function relationship as well as identification of important binding interactions. Rapid advancement of structural bioinformatics has been made over the last decade. With more information available for CYP structures, the methods of structural bioinformatics may be used in the CYP field. In this review, we demonstrate three previous studies on CYP using the methods of structural bioinformatics, including the investigation of reasons for decrease of enzymatic activity of CYP1A2 caused by a peripheral mutation, the construction of a pharmacophore model specific to active site of CYP1A2 and the prediction of the functional consequences of single residue mutation in CYP. By illustrating these studies we attempt to show the potential role of structural bioinformatics in CYP research and help better understanding the importance of structural bioinformatics in drug designing.

  2. Computer programming and biomolecular structure studies: A step beyond internet bioinformatics.

    PubMed

    Likić, Vladimir A

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled Biomolecular Structure and Bioinformatics. Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics approach that relies on access to the Internet and biological databases. This was an ambitious approach considering that the students mostly had a biological background. There were also time constraints of eight lectures in total and two accompanying practical sessions. The main challenge was that students had to be introduced to computer programming from a beginner level and in a short time provided with enough knowledge to independently solve a simple bioinformatics problem. This was accomplished with a problem directly relevant to the rest of the subject, concerned with the structure-function relationships and experimental techniques for the determination of macromolecular structure.

  3. On dimension reduction of clustering results in structural bioinformatics.

    PubMed

    Iván, Gábor; Grolmusz, Vince

    2014-12-01

    OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a reachability plot that can either be used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the clustering structure of the input set, even if the input set is embedded in higher dimensions. The focus of this work is a visualization method that can be applied for comparing two, independent hierarchical clusterings by assigning colors to all entries of the input database. We give two applications related to macromolecular structural properties: the first is a sequence-based clustering of the SwissProt database that is evaluated using NCBI taxonomy identifiers, and the second application involves clustering locations of specific atoms in the serine protease enzyme family-and the clusters are evaluated using SCOP structural classifications. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Structural and evolutionary bioinformatics of the SPOUT superfamily of methyltransferases

    PubMed Central

    Tkaczuk, Karolina L; Dunin-Horkawicz, Stanislaw; Purta, Elzbieta; Bujnicki, Janusz M

    2007-01-01

    Background SPOUT methyltransferases (MTases) are a large class of S-adenosyl-L-methionine-dependent enzymes that exhibit an unusual alpha/beta fold with a very deep topological knot. In 2001, when no crystal structures were available for any of these proteins, Anantharaman, Koonin, and Aravind identified homology between SpoU and TrmD MTases and defined the SPOUT superfamily. Since then, multiple crystal structures of knotted MTases have been solved and numerous new homologous sequences appeared in the databases. However, no comprehensive comparative analysis of these proteins has been carried out to classify them based on structural and evolutionary criteria and to guide functional predictions. Results We carried out extensive searches of databases of protein structures and sequences to collect all members of previously identified SPOUT MTases, and to identify previously unknown homologs. Based on sequence clustering, characterization of domain architecture, structure predictions and sequence/structure comparisons, we re-defined families within the SPOUT superfamily and predicted putative active sites and biochemical functions for the so far uncharacterized members. We have also delineated the common core of SPOUT MTases and inferred a multiple sequence alignment for the conserved knot region, from which we calculated the phylogenetic tree of the superfamily. We have also studied phylogenetic distribution of different families, and used this information to infer the evolutionary history of the SPOUT superfamily. Conclusion We present the first phylogenetic tree of the SPOUT superfamily since it was defined, together with a new scheme for its classification, and discussion about conservation of sequence and structure in different families, and their functional implications. We identified four protein families as new members of the SPOUT superfamily. Three of these families are functionally uncharacterized (COG1772, COG1901, and COG4080), and one (COG1756

  5. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data.

    PubMed

    L'Yi, Sehi; Ko, Bongkyung; Shin, DongHwa; Cho, Young-Joon; Lee, Jaeyong; Kim, Bohyoung; Seo, Jinwook

    2015-01-01

    Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results.

  6. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    PubMed Central

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  7. The role of structural bioinformatics resources in the era of integrative structural biology

    PubMed Central

    Gutmanas, Aleksandras; Oldfield, Thomas J.; Patwardhan, Ardan; Sen, Sanchayita; Velankar, Sameer; Kleywegt, Gerard J.

    2013-01-01

    The history and the current state of the PDB and EMDB archives is briefly described, as well as some of the challenges that they face. It seems natural that the role of structural biology archives will change from being a pure repository of historic data into becoming an indispensable resource for the wider biomedical community. As part of this transformation, it will be necessary to validate the biomacromolecular structure data and ensure the highest possible quality for the archive holdings, to combine structural data from different spatial scales into a unified resource and to integrate structural data with functional, genetic and taxonomic data as well as other information available in bioinformatics resources. Some recent developments and plans to address these challenges at PDBe are presented. PMID:23633580

  8. Bioinformatics and variability in drug response: a protein structural perspective

    PubMed Central

    Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.

    2012-01-01

    Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919

  9. E-MSD: the European Bioinformatics Institute Macromolecular Structure Database

    PubMed Central

    Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Golovin, A.; Henrick, K.; Hussain, A.; Ionides, J.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Oldfield, T.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, J.; Tagari, M.; Tate, J.; Tromm, S.; Velankar, S.; Vranken, W.

    2003-01-01

    The E-MSD macromolecular structure relational database (http://www.ebi.ac.uk/msd) is designed to be a single access point for protein and nucleic acid structures and related information. The database is derived from Protein Data Bank (PDB) entries. Relational database technologies are used in a comprehensive cleaning procedure to ensure data uniformity across the whole archive. The search database contains an extensive set of derived properties, goodness-of-fit indicators, and links to other EBI databases including InterPro, GO, and SWISS-PROT, together with links to SCOP, CATH, PFAM and PROSITE. A generic search interface is available, coupled with a fast secondary structure domain search tool. PMID:12520052

  10. Integrating structure, bioinformatics and enzymology to discover function : BioH, a new carboxylesterase from E. coli.

    SciTech Connect

    Sanishvili, R.; Yakunin, A. F.; Laskowski, R. A.; Skarina, T.; Evdokimova, E.; Doherty-Kirby, A.; Lajoie, G. A.; Thornton, J. M.; Arrowsmith, C. H.; Savchenko, A.; Joachimiak, A.; Edwards, A. M.; Univ. of Toronto; Clinical Genomics Centre European Bioinformatics Inst.; Univ. of Western Ontario

    2003-07-11

    Structural proteomics projects are generating three-dimensional structures of novel, uncharacterized proteins at an increasing rate. However, structure alone is often insufficient to deduce the specific biochemical function of a protein. Here we determined the function for a protein using a strategy that integrates structural and bioinformatics data with parallel experimental screening for enzymatic activity. BioH is involved in biotin biosynthesis in Escherichia coli and had no previously known biochemical function. The crystal structure of BioH was determined at 1.7 {angstrom} resolution. An automated procedure was used to compare the structure of BioH with structural templates from a variety of different enzyme active sites. This screen identified a catalytic triad (Ser{sup 82}, His{sup 235}, and Asp{sup 207}) with a configuration similar to that of the catalytic triad of hydrolases. Analysis of BioH with a panel of hydrolase assays revealed a carboxylesterase activity with a preference for short acyl chain substrates. The combined use of structural bioinformatics with experimental screens for detecting enzyme activity could greatly enhance the rate at which function is determined from structure.

  11. STRUCTURELAB: a heterogeneous bioinformatics system for RNA structure analysis.

    PubMed

    Shapiro, B A; Kasprzak, W

    1996-08-01

    STRUCTURELAB is a computational system that has been developed to permit the use of a broad array of approaches for the analysis of the structure of RNA. The goal of the development is to provide a large set of tools that can be well integrated with experimental biology to aid in the process of the determination of the underlying structure of RNA sequences. The approach taken views the structure determination problem as one of dealing with a database of many computationally generated structures and provides the capability to analyze this data set from different perspectives. Many algorithms are integrated into one system that also utilizes a heterogeneous computing approach permitting the use of several computer architectures to help solve the posed problems. These different computational platforms make it relatively easy to incorporate currently existing programs as well as newly developed algorithms and to best match these algorithms to the appropriate hardware. The system has been written in Common Lisp running on SUN or SGI Unix workstations, and it utilizes a network of participating machines defined in reconfigurable tables. A window-based interface makes this heterogeneous environment as transparent to the user as possible.

  12. [Research thoughts on structural components of Chinese medicine combined with bioinformatics].

    PubMed

    Wang, Cheng-cheng; Feng, Liang; Liu, Dan; Cui, Li; Tan, Xiao-bin; Jia, Xiao-bin

    2015-11-01

    Traditional Chinese medicine(TCM) is a complex system, featured with integrity and characteristics. Structural component TCM is a well-organized integrity of traditional Chinese medicine, reflecting multi-component integration effect of TCM. It gives us a new view on the material basis of TCM. Currently, conventional researching strategies are not enough to deal with the relationship between material basis and efficacy, multi-composition, multi-targets, and multi-section mechanism. Post-genome area gives a birth to bioinformatics, which involves systematic biology, different levels of omics, corresponding mathematics and computer techniques. It increasingly becomes a powerful tool to understand complicated system and life essential laws. Research ideas, methods. and knowledge of data mining technology of bioinformatics combined with the theory of structural components of Chinese medicine bring a new opportunity for developing structural components of Chinese medicine, systematically exploring the essence of TCM and promoting the modernization of TCM.

  13. Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

    NASA Astrophysics Data System (ADS)

    Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

    2017-08-01

    Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria – Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.

  14. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  15. Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery

    PubMed Central

    Blundell, Tom L; Sibanda, Bancinyane L; Montalvão, Rinaldo Wander; Brewerton, Suzanne; Chelliah, Vijayalakshmi; Worth, Catherine L; Harmer, Nicholas J; Davies, Owen; Burke, David

    2006-01-01

    Impressive progress in genome sequencing, protein expression and high-throughput crystallography and NMR has radically transformed the opportunities to use protein three-dimensional structures to accelerate drug discovery, but the quantity and complexity of the data have ensured a central place for informatics. Structural biology and bioinformatics have assisted in lead optimization and target identification where they have well established roles; they can now contribute to lead discovery, exploiting high-throughput methods of structure determination that provide powerful approaches to screening of fragment binding. PMID:16524830

  16. Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines.

    PubMed

    D'Argenio, Valeria; Casaburi, Giorgio; Precone, Vincenza; Salvatore, Francesco

    2014-01-01

    Technological advances in next-generation sequencing-based approaches have greatly impacted the analysis of microbial community composition. In particular, 16S rRNA-based methods have been widely used to analyze the whole set of bacteria present in a target environment. As a consequence, several specific bioinformatic pipelines have been developed to manage these data. MetaGenome Rapid Annotation using Subsystem Technology (MG-RAST) and Quantitative Insights Into Microbial Ecology (QIIME) are two freely available tools for metagenomic analyses that have been used in a wide range of studies. Here, we report the comparative analysis of the same dataset with both QIIME and MG-RAST in order to evaluate their accuracy in taxonomic assignment and in diversity analysis. We found that taxonomic assignment was more accurate with QIIME which, at family level, assigned a significantly higher number of reads. Thus, QIIME generated a more accurate BIOM file, which in turn improved the diversity analysis output. Finally, although informatics skills are needed to install QIIME, it offers a wide range of metrics that are useful for downstream applications and, not less important, it is not dependent on server times.

  17. Comparative Metagenomic Analysis of Human Gut Microbiome Composition Using Two Different Bioinformatic Pipelines

    PubMed Central

    D'Argenio, Valeria; Precone, Vincenza

    2014-01-01

    Technological advances in next-generation sequencing-based approaches have greatly impacted the analysis of microbial community composition. In particular, 16S rRNA-based methods have been widely used to analyze the whole set of bacteria present in a target environment. As a consequence, several specific bioinformatic pipelines have been developed to manage these data. MetaGenome Rapid Annotation using Subsystem Technology (MG-RAST) and Quantitative Insights Into Microbial Ecology (QIIME) are two freely available tools for metagenomic analyses that have been used in a wide range of studies. Here, we report the comparative analysis of the same dataset with both QIIME and MG-RAST in order to evaluate their accuracy in taxonomic assignment and in diversity analysis. We found that taxonomic assignment was more accurate with QIIME which, at family level, assigned a significantly higher number of reads. Thus, QIIME generated a more accurate BIOM file, which in turn improved the diversity analysis output. Finally, although informatics skills are needed to install QIIME, it offers a wide range of metrics that are useful for downstream applications and, not less important, it is not dependent on server times. PMID:24719854

  18. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  19. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  20. BioGPS descriptors for rational engineering of enzyme promiscuity and structure based bioinformatic analysis.

    PubMed

    Ferrario, Valerio; Siragusa, Lydia; Ebert, Cynthia; Baroni, Massimo; Foscato, Marco; Cruciani, Gabriele; Gardossi, Lucia

    2014-01-01

    A new bioinformatic methodology was developed founded on the Unsupervised Pattern Cognition Analysis of GRID-based BioGPS descriptors (Global Positioning System in Biological Space). The procedure relies entirely on three-dimensional structure analysis of enzymes and does not stem from sequence or structure alignment. The BioGPS descriptors account for chemical, geometrical and physical-chemical features of enzymes and are able to describe comprehensively the active site of enzymes in terms of "pre-organized environment" able to stabilize the transition state of a given reaction. The efficiency of this new bioinformatic strategy was demonstrated by the consistent clustering of four different Ser hydrolases classes, which are characterized by the same active site organization but able to catalyze different reactions. The method was validated by considering, as a case study, the engineering of amidase activity into the scaffold of a lipase. The BioGPS tool predicted correctly the properties of lipase variants, as demonstrated by the projection of mutants inside the BioGPS "roadmap".

  1. Bioinformatic Analysis of the Contribution of Primer Sequences to Aptamer Structures

    PubMed Central

    Ellington, Andrew D.

    2009-01-01

    Aptamers are nucleic acid molecules selected in vitro to bind a particular ligand. While numerous experimental studies have examined the sequences, structures, and functions of individual aptamers, considerably fewer studies have applied bioinformatics approaches to try to infer more general principles from these individual studies. We have used a large Aptamer Database to parse the contributions of both random and constant regions to the secondary structures of more than 2000 aptamers. We find that the constant, primer-binding regions do not, in general, contribute significantly to aptamer structures. These results suggest that (a) binding function is not contributed to nor constrained by constant regions; (b) in consequence, the landscape of functional binding sequences is sparse but robust, favoring scenarios for short, functional nucleic acid sequences near origins; and (c) many pool designs for the selection of aptamers are likely to prove robust. PMID:18594898

  2. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  3. Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

    PubMed Central

    Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

  4. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  5. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  6. Plant pectin acetylesterase structure and function: new insights from bioinformatic analysis.

    PubMed

    Philippe, Florian; Pelloux, Jérôme; Rayon, Catherine

    2017-06-08

    Pectins are plant cell wall polysaccharides that can be acetylated on C2 and/or C3 of galacturonic acid residues. The degree of acetylation of pectin can be modulated by pectin acetylesterase (EC 3.1.1.6, PAE). The function and structure of plant PAEs remain poorly understood and the role of the fine-tuning of pectin acetylation on cell wall properties has not yet been elucidated. In the present study, a bioinformatic approach was used on 72 plant PAEs from 16 species among 611 plant PAEs available in plant genomic databases. An overview of plant PAE proteins, particularly Arabidopsis thaliana PAEs, based on phylogeny analysis, protein motif identification and modeled 3D structure is presented. A phylogenetic tree analysis using protein sequences clustered the plant PAEs into five clades. AtPAEs clustered in four clades in the plant kingdom PAE tree while they formed three clades when a phylogenetic tree was performed only on Arabidopsis proteins, due to isoform AtPAE9. Primitive plants that display a smaller number of PAEs clustered into two clades, while in higher plants, the presence of multiple members of PAE genes indicated a diversification of AtPAEs. 3D homology modeling of AtPAE8 from clade 2 with a human Notum protein showed an α/β hydrolase structure with the hallmark Ser-His-Asp of the active site. A 3D model of AtPAE4 from clade 1 and AtPAE10 from clade 3 showed a similar shape suggesting that the diversification of AtPAEs is unlikely to arise from the shape of the protein. Primary structure prediction analysis of AtPAEs showed a specific motif characteristic of each clade and identified one major group of AtPAEs with a signal peptide and one group without a signal peptide. A multiple sequence alignment of the putative plant PAEs revealed consensus sequences with important putative catalytic residues: Ser, Asp, His and a pectin binding site. Data mining of gene expression profiles of AtPAE revealed that genes from clade 2 including AtPAE7, AtPAE8 and

  7. Determination of Lipid-Protein Interactions in Lung Surfactants Using Computer Simulations and Structural Bioinformatics.

    NASA Astrophysics Data System (ADS)

    Kaznessis, Yiannis

    2001-06-01

    Proteins are the primary components of the networks that conduct the flows of mass, energy and information in living organisms. The discovery of the principles of protein structure and function allows the development of design rules for biological activities. The microscopic nature of the operating mechanisms of protein activity, and the vast complexity of the networks of interaction call for the employment of powerful computational methodologies that can decipher the physicochemical and evolutionary principles underlying protein structure and function. An example will be presented that reflects the strength of computational approaches. Atomistic molecular dynamics simulations and structural bioinformatics tools are employed to investigate the interactions between the first 25 N-terminal residues of surfactant protein B (SP-B 1-25) and the lipid components of the lung surfactant (LS). An understanding of the molecular level interactions between the LS components is essential for the establishment of design rules for the development of synthetic LS and the treatment of the neonatal respiratory distress syndrome, which results from deficiency or inactivation of LS.

  8. Predicting the impact of deleterious single point mutations in SMAD gene family using structural bioinformatics approach.

    PubMed

    George Priya Doss, C; Nagasundaram, N; Tanwar, Himani

    2012-06-01

    Functional alteration in SMAD proteins leads to dis-regulation of its mechanism results in possibilities of high risk diseases like fibrosis, cancer, juvenile polyposis etc. Studying single nucleotide polymorphism (SNP) in SMAD genes helps understand the malfunction of these proteins. In this study, we focused on deleterious effects of nsSNPs in both structural and functional level using publically available bioinformatics tools. We have mainly focused on identifying deleterious nsSNPs in both structural and functional level in SMAD genes by using SIFT, PolyPhen, SNPs&GO, I-Mutant 3.0, MUpro and PANTHER. Structure analysis was carried out with the major mutation that occurred in the native protein coded by SMAD genes and its amino acid positions (R358W, K306S, R310G, S433R and R361C). SRide was used to check the stability of the native and mutant modelled proteins. In addition, we used MAPPER to identify SNPs present in transcription factor binding sites. These findings demonstrate that the in silico approaches can be used efficiently to identify potential candidate SNPs in large scale analysis.

  9. Bioinformatic characterization of type-specific sequence and structural features in auxiliary activity family 9 proteins.

    PubMed

    Moses, Vuyani; Hatherley, Rowan; Tastan Bishop, Özlem

    2016-01-01

    Due to the impending depletion of fossil fuels, it has become important to identify alternative energy sources. The biofuel industry has proven to be a promising alternative. However, owing to the complex nature of plant biomass, hence the degradation, biofuel production remains a challenge. The copper-dependent Auxiliary Activity family 9 (AA9) proteins have been found to act synergistically with other cellulose-degrading enzymes resulting in an increased rate of cellulose breakdown. AA9 proteins are lytic polysaccharide monooxygenase (LPMO) enzymes, otherwise known as polysaccharide monooxygenases (PMOs). They are further classified as Type 1, 2 or 3 PMOs, depending on the different cleavage products formed. As AA9 proteins are known to exhibit low sequence conservation, the analysis of unique features of AA9 domains of these enzymes should provide insights for the better understanding of how different AA9 PMO types function. Bioinformatics approaches were used to identify features specific to the catalytic AA9 domains of each type of AA9 PMO. Sequence analysis showed the N terminus to be highly variable with type-specific inserts evident in this region. Phylogenetic analysis was performed to cluster AA9 domains based on their types. Motif analysis enabled the identification of sub-groups within each AA9 PMO type with the majority of these motifs occurring within the highly variable N terminus of AA9 domains. AA9 domain structures were manually docked to crystalline cellulose and used to analyze both the type-specific inserts and motifs at a structural level. The results indicated that these regions influence the AA9 domain active site topology and may contribute to the regioselectivity displayed by different AA9 PMO types. Physicochemical property analysis was performed and detected significant differences in aromaticity, isoelectric point and instability index between certain AA9 PMO types. In this study, a type-specific characterisation of AA9 domains was

  10. A new bioinformatic approach to detect common 3D sites in protein structures.

    PubMed

    Jambon, Martin; Imberty, Anne; Deléage, Gilbert; Geourjon, Christophe

    2003-08-01

    An innovative bioinformatic method has been designed and implemented to detect similar three-dimensional (3D) sites in proteins. This approach allows the comparison of protein structures or substructures and detects local spatial similarities: this method is completely independent from the amino acid sequence and from the backbone structure. In contrast to already existing tools, the basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed. The implementation of this heuristic constitutes a software named SuMo (Surfing the Molecules), which allows the dynamic definition of chemical groups, the selection of sites in the proteins, and the management and screening of databases. To show the relevance of this approach, we focused on two extreme examples illustrating convergent and divergent evolution. In two unrelated serine proteases, SuMo detects one common site, which corresponds to the catalytic triad. In the legume lectins family composed of >100 structures that share similar sequences and folds but may have lost their ability to bind a carbohydrate molecule, SuMo discriminates between functional and non-functional lectins with a selectivity of 96%. The time needed for searching a given site in a protein structure is typically 0.1 s on a PIII 800MHz/Linux computer; thus, in further studies, SuMo will be used to screen the PDB. Copyright 2003 Wiley-Liss, Inc.

  11. Structural, bioinformatic, and in vivo analyses of two Treponema pallidum lipoproteins reveal a unique TRAP transporter

    PubMed Central

    Deka, Ranjit K.; Brautigam, Chad A.; Goldberg, Martin; Schuck, Peter; Tomchick, Diana R.; Norgard, Michael V.

    2012-01-01

    Treponema pallidum, the bacterial agent of syphilis, is predicted to encode one tripartite ATP- independent periplasmic transporter (TRAP-T). TRAP-Ts typically employ a periplasmic substrate-binding protein (SBP) to deliver the cognate ligand to the transmembrane symporter. Herein, we demonstrate that the genes encoding the putative TRAP-T components from T. pallidum, tp0957 (the SBP) and tp0958 (the symporter) are in an operon with an uncharacterized third gene, tp0956. We determined the crystal structure of recombinant Tp0956; the protein is trimeric and perforated by a pore. Part of Tp0956 forms an assembly similar to those of “tetratricopeptide repeat” (TPR) motifs. The crystal structure of recombinant Tp0957 was also determined; like the SBPs of other TRAP-Ts, there are two lobes separated by a cleft. In these other SBPs, the cleft binds a negatively charged ligand. However, the cleft of Tp0957 has a strikingly hydrophobic chemical composition, indicating that its ligand may be substantially different and likely hydrophobic. Analytical ultracentrifugation of the recombinant versions of Tp0956 and Tp0957 established that these proteins associate avidly. This unprecedented interaction was confirmed for the native molecules using in vivo cross-linking experiments. Finally, bioinformatic analyses suggested that this transporter exemplifies a new subfamily of TPR-protein associated TRAP transporters (TPATs) that require the action of a TPR-containing accessory protein for the periplasmic transport of a potentially hydrophobic ligand(s). PMID:22306465

  12. Structural, Bioinformatic, and In Vivo Analyses of Two Treponema pallidum Lipoproteins Reveal a Unique TRAP Transporter

    SciTech Connect

    Deka, Ranjit K.; Brautigam, Chad A.; Goldberg, Martin; Schuck, Peter; Tomchick, Diana R.; Norgard, Michael V.

    2012-05-25

    Treponema pallidum, the bacterial agent of syphilis, is predicted to encode one tripartite ATP-independent periplasmic transporter (TRAP-T). TRAP-Ts typically employ a periplasmic substrate-binding protein (SBP) to deliver the cognate ligand to the transmembrane symporter. Herein, we demonstrate that the genes encoding the putative TRAP-T components from T. pallidum, tp0957 (the SBP), and tp0958 (the symporter), are in an operon with an uncharacterized third gene, tp0956. We determined the crystal structure of recombinant Tp0956; the protein is trimeric and perforated by a pore. Part of Tp0956 forms an assembly similar to those of 'tetratricopeptide repeat' (TPR) motifs. The crystal structure of recombinant Tp0957 was also determined; like the SBPs of other TRAP-Ts, there are two lobes separated by a cleft. In these other SBPs, the cleft binds a negatively charged ligand. However, the cleft of Tp0957 has a strikingly hydrophobic chemical composition, indicating that its ligand may be substantially different and likely hydrophobic. Analytical ultracentrifugation of the recombinant versions of Tp0956 and Tp0957 established that these proteins associate avidly. This unprecedented interaction was confirmed for the native molecules using in vivo cross-linking experiments. Finally, bioinformatic analyses suggested that this transporter exemplifies a new subfamily of TPATs (TPR-protein-associated TRAP-Ts) that require the action of a TPR-containing accessory protein for the periplasmic transport of a potentially hydrophobic ligand(s).

  13. Bioinformatics Study of Structural Patterns in Plant MicroRNA Precursors.

    PubMed

    Miskiewicz, J; Tomczyk, K; Mickiewicz, A; Sarzynska, J; Szachniuk, M

    2017-01-01

    According to the RNA world theory, RNAs which stored genetic information and catalyzed chemical reactions had their contribution in the formation of current living organisms. In recent years, researchers studied this molecule diversity, i.a. focusing on small non-coding regulatory RNAs. Among them, of particular interest is evolutionarily ancient, 19-24 nt molecule of microRNA (miRNA). It has been already recognized as a regulator of gene expression in eukaryotes. In plants, miRNA plays a key role in the response to stress conditions and it participates in the process of growth and development. MicroRNAs originate from primary transcripts (pri-miRNA) encoded in the nuclear genome. They are processed from single-stranded stem-loop RNA precursors containing hairpin structures. While the mechanism of mature miRNA production in animals is better understood, its biogenesis in plants remains less clear. Herein, we present the results of bioinformatics analysis aimed at discovering how plant microRNAs are recognized within their precursors (pre-miRNAs). The study has been focused on sequential and structural motif identification in the neighbourhood of microRNA.

  14. Bioinformatics Study of Structural Patterns in Plant MicroRNA Precursors

    PubMed Central

    Tomczyk, K.; Mickiewicz, A.; Sarzynska, J.

    2017-01-01

    According to the RNA world theory, RNAs which stored genetic information and catalyzed chemical reactions had their contribution in the formation of current living organisms. In recent years, researchers studied this molecule diversity, i.a. focusing on small non-coding regulatory RNAs. Among them, of particular interest is evolutionarily ancient, 19–24 nt molecule of microRNA (miRNA). It has been already recognized as a regulator of gene expression in eukaryotes. In plants, miRNA plays a key role in the response to stress conditions and it participates in the process of growth and development. MicroRNAs originate from primary transcripts (pri-miRNA) encoded in the nuclear genome. They are processed from single-stranded stem-loop RNA precursors containing hairpin structures. While the mechanism of mature miRNA production in animals is better understood, its biogenesis in plants remains less clear. Herein, we present the results of bioinformatics analysis aimed at discovering how plant microRNAs are recognized within their precursors (pre-miRNAs). The study has been focused on sequential and structural motif identification in the neighbourhood of microRNA. PMID:28280737

  15. Comparative bioinformatics, temporal and spatial expression analyses of Ixodes scapularis organic anion transporting polypeptides

    PubMed Central

    Radulović, Željko; Porter, Lindsay M.; Kim, Tae K.; Mulenga, Albert

    2015-01-01

    Organic anion-transporting polypeptides (Oatps) are an integral part of the detoxification mechanism in vertebrates and invertebrates. These cell surface proteins are involved in mediating the sodium-independent uptake and/or distribution of a broad array of organic amphipathic compounds and xenobiotic drugs. This study describes bioinformatics and biological characterization of 9 Oatp sequences in the Ixodes scapularis genome. These sequences have been annotated on the basis of 12 transmembrane domains, consensus motif D-X-RW-(I,V)-GAWW-X-G-(F,L)-L, and 11 conserved cysteine amino acid residues in the large extracellular loop 5 that characterize the Oatp superfamily. Ixodes scapularis Oatps may regulate non-redundant cross-tick species conserved functions in that they did not cluster as a monolithic group on the phylogeny tree and that they have orthologs in other ticks. Phylogeny clustering patterns also suggest that some tick Oatp sequences transport substrates that are similar to those of body louse, mosquito, eye worm, and filarial worm Oatps. Semi-quantitative RT-PCR analysis demonstrated that all 9 I. scapularis Oatp sequences were expressed during tick feeding. Ixodes scapularis Oatp genes potentially regulate functions during early and/or late-stage tick feeding as revealed by normalized mRNA profiles. Normalized transcript abundance indicates that I. scapularis Oatp genes are strongly expressed in unfed ticks during the first 24 h of feeding and/or at the end of the tick feeding process. Except for 2 I. scapularis Oatps, which were expressed in the salivary glands and ovaries, all other genes were expressed in all tested organs, suggesting the significance of I. scapularis Oatps in maintaining tick homeostasis. Different I. scapularis Oatp mRNA expression patterns were detected and discussed with reference to different physiological states of unfed and feeding ticks. PMID:24582512

  16. Edge Bioinformatics

    SciTech Connect

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance

  17. Structural Bioinformatics-Based Prediction of Exceptional Selectivity of p38 MAP Kinase Inhibitor PH-797804

    SciTech Connect

    Xing, Li; Shieh, Huey S.; Selness, Shaun R.; Devraj, Rajesh V.; Walker, John K.; Devadas, Balekudru; Hope, Heidi R.; Compton, Robert P.; Schindler, John F.; Hirsch, Jeffrey L.; Benson, Alan G.; Kurumbail, Ravi G.; Stegeman, Roderick A.; Williams, Jennifer M.; Broadus, Richard M.; Walden, Zara; Monahan, Joseph B.; Pfizer

    2009-07-24

    PH-797804 is a diarylpyridinone inhibitor of p38{alpha} mitogen-activated protein (MAP) kinase derived from a racemic mixture as the more potent atropisomer (aS), first proposed by molecular modeling and subsequently confirmed by experiments. On the basis of structural comparison with a different biaryl pyrazole template and supported by dozens of high-resolution crystal structures of p38{alpha} inhibitor complexes, PH-797804 is predicted to possess a high level of specificity across the broad human kinase genome. We used a structural bioinformatics approach to identify two selectivity elements encoded by the TXXXG sequence motif on the p38{alpha} kinase hinge: (i) Thr106 that serves as the gatekeeper to the buried hydrophobic pocket occupied by 2,4-difluorophenyl of PH-797804 and (ii) the bidentate hydrogen bonds formed by the pyridinone moiety with the kinase hinge requiring an induced 180{sup o} rotation of the Met109-Gly110 peptide bond. The peptide flip occurs in p38{alpha} kinase due to the critical glycine residue marked by its conformational flexibility. Kinome-wide sequence mining revealed rare presentation of the selectivity motif. Corroboratively, PH-797804 exhibited exceptionally high specificity against MAP kinases and the related kinases. No cross-reactivity was observed in large panels of kinase screens (selectivity ratio of >500-fold). In cellular assays, PH-797804 demonstrated superior potency and selectivity consistent with the biochemical measurements. PH-797804 has met safety criteria in human phase I studies and is under clinical development for several inflammatory conditions. Understanding the rationale for selectivity at the molecular level helps elucidate the biological function and design of specific p38{alpha} kinase inhibitors.

  18. The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis

    PubMed Central

    Alva, Vikram; Nam, Seung-Zin; Söding, Johannes; Lupas, Andrei N.

    2016-01-01

    The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment. PMID:27131380

  19. Analysis of RNAseq datasets from a comparative infectious disease zebrafish model using GeneTiles bioinformatics.

    PubMed

    Veneman, Wouter J; de Sonneville, Jan; van der Kolk, Kees-Jan; Ordas, Anita; Al-Ars, Zaid; Meijer, Annemarie H; Spaink, Herman P

    2015-03-01

    We present a RNA deep sequencing (RNAseq) analysis of a comparison of the transcriptome responses to infection of zebrafish larvae with Staphylococcus epidermidis and Mycobacterium marinum bacteria. We show how our developed GeneTiles software can improve RNAseq analysis approaches by more confidently identifying a large set of markers upon infection with these bacteria. For analysis of RNAseq data currently, software programs such as Bowtie2 and Samtools are indispensable. However, these programs that are designed for a LINUX environment require some dedicated programming skills and have no options for visualisation of the resulting mapped sequence reads. Especially with large data sets, this makes the analysis time consuming and difficult for non-expert users. We have applied the GeneTiles software to the analysis of previously published and newly obtained RNAseq datasets of our zebrafish infection model, and we have shown the applicability of this approach also to published RNAseq datasets of other organisms by comparing our data with a published mammalian infection study. In addition, we have implemented the DEXSeq module in the GeneTiles software to identify genes, such as glucagon A, that are differentially spliced under infection conditions. In the analysis of our RNAseq data, this has led to the possibility to improve the size of data sets that could be efficiently compared without using problem-dedicated programs, leading to a quick identification of marker sets. Therefore, this approach will also be highly useful for transcriptome analyses of other organisms for which well-characterised genomes are available.

  20. Comparative and bioinformatics analyses of pathogenic bacterial secretomes identified by mass spectrometry in Burkholderia species.

    PubMed

    Nguyen, Thao Thi; Chon, Tae-Soo; Kim, Jaehan; Seo, Young-Su; Heo, Muyoung

    2017-07-01

    Secreted proteins (secretomes) play crucial roles during bacterial pathogenesis in both plant and human hosts. The identification and characterization of secretomes in the two plant pathogens Burkholderia glumae BGR1 and B. gladioli BSR3, which cause diseases in rice such as seedling blight, panicle blight, and grain rot, are important steps to not only understand the disease-causing mechanisms but also find remedies for the diseases. Here, we identified two datasets of secretomes in B. glumae BGR1 and B. gladioli BSR3, which consist of 118 and 111 proteins, respectively, using mass spectrometry approach and literature curation. Next, we characterized the functional properties, potential secretion pathways and sequence information properties of secretomes of two plant pathogens in a comparative analysis by various computational approaches. The ratio of potential non-classically secreted proteins (NCSPs) to classically secreted proteins (CSPs) in B. glumae BGR1 was greater than that in B. gladioli BSR3. For CSPs, the putative hydrophobic regions (PHRs) which are essential for secretion process of CSPs were screened in detail at their N-terminal sequences using hidden Markov model (HMM)-based method. Total 31 pairs of homologous proteins in two bacterial secretomes were indicated based on the global alignment (identity ≥ 70%). Our results may facilitate the understanding of the species-specific features of secretomes in two plant pathogenic Burkholderia species.

  1. Bioinformatics investigation of therapeutic mechanisms of Xuesaitong capsule treating ischemic cerebrovascular rat model with comparative transcriptome analysis

    PubMed Central

    Liao, Jiangquan; Wei, Benjun; Chen, Hengwen; Liu, Yongmei; Wang, Jie

    2016-01-01

    Background: Xuesaitong soft capsule (XST) which consists of panax notoginseng saponin (PNS) has been used to treat ischemic cerebrovascular diseases in China. The therapeutic mechanism of XST has not been elucidated yet from prospective of genomics and bioinformatics. Methods: A transcriptome analysis was performed to review series concerning middle cerebral artery occlusion (MCAO) rat model and XST intervention after MCAO from Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were compared between blank group and model group, model group and XST group. Functional enrichment and pathway analysis were performed. Protein-Protein interaction network was constructed. The overlapping genes from two DEGs sets were screened out and profound analysis was performed. Results: Two series including 22 samples were obtained. 870 DEGs were identified between blank group and model group, and 1189 DEGs were identified between model group and XST group. GO terms and KEGG pathways of MCAO and XST intervention were significantly enriched. PPI networks were constructed to demonstrate the gene-gene interactions. The overlapping genes from two DEGs sets were highlighted. ANTXR2, FHL3, PRCP, TYROBP, TAF9B, FGFR2, BCL11B, RB1CC1 and MBNL2 were the pivotal genes and possible action sites of XST therapeutic mechanisms. Conclusion: MCAO is a pathological process with multiple. PMID:27347353

  2. Detailed mapping of RNA secondary structures in core and NS5B-encoding region sequences of hepatitis C virus by RNase cleavage and novel bioinformatic prediction methods.

    PubMed

    Tuplin, A; Evans, D J; Simmonds, P

    2004-10-01

    There is accumulating evidence from bioinformatic studies that hepatitis C virus (HCV) possesses extensive RNA secondary structure in the core and NS5B-encoding regions of the genome. Recent functional studies have defined one such stem-loop structure in the NS5B region as an essential cis-acting replication element (CRE). A program was developed (STRUCTUR_DIST) that analyses multiple rna-folding patterns predicted by mfold to determine the evolutionary conservation of predicted stem-loop structures and, by a new method, to analyse frequencies of covariant sites in predicted RNA folding between HCV genotypes. These novel bioinformatic methods have been combined with enzymic mapping of RNA transcripts from the core and NS5B regions to precisely delineate the RNA structures that are present in these genomic regions. Together, these methods predict the existence of multiple, often juxtaposed stem-loops that are found in all HCV genotypes throughout both regions, as well as several strikingly conserved single-stranded regions, one of which coincides with a region of the genome to which ribosomal access is required for translation initiation. Despite the existence of marked sequence conservation between genotypes in the HCV CRE and single-stranded regions, there was no evidence for comparable suppression of variability at either synonymous or non-synonymous sites in the other predicted stem-loop structures. The configuration and genetic variability of many of these other NS5B and core structures is perhaps more consistent with their involvement in genome-scale ordered RNA structure, a structural configuration of the genomes of many positive-stranded RNA viruses that is associated with host persistence.

  3. Identification and Comparative Analysis of H2O2-Scavenging Enzymes (Ascorbate Peroxidase and Glutathione Peroxidase) in Selected Plants Employing Bioinformatics Approaches

    PubMed Central

    Ozyigit, Ibrahim I.; Filiz, Ertugrul; Vatansever, Recep; Kurtoglu, Kuaybe Y.; Koc, Ibrahim; Öztürk, Münir X.; Anjum, Naser A.

    2016-01-01

    Among major reactive oxygen species (ROS), hydrogen peroxide (H2O2) exhibits dual roles in plant metabolism. Low levels of H2O2 modulate many biological/physiological processes in plants; whereas, its high level can cause damage to cell structures, having severe consequences. Thus, steady-state level of cellular H2O2 must be tightly regulated. Glutathione peroxidases (GPX) and ascorbate peroxidase (APX) are two major ROS-scavenging enzymes which catalyze the reduction of H2O2 in order to prevent potential H2O2-derived cellular damage. Employing bioinformatics approaches, this study presents a comparative evaluation of both GPX and APX in 18 different plant species, and provides valuable insights into the nature and complex regulation of these enzymes. Herein, (a) potential GPX and APX genes/proteins from 18 different plant species were identified, (b) their exon/intron organization were analyzed, (c) detailed information about their physicochemical properties were provided, (d) conserved motif signatures of GPX and APX were identified, (e) their phylogenetic trees and 3D models were constructed, (f) protein-protein interaction networks were generated, and finally (g) GPX and APX gene expression profiles were analyzed. Study outcomes enlightened GPX and APX as major H2O2-scavenging enzymes at their structural and functional levels, which could be used in future studies in the current direction. PMID:27047498

  4. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  5. Bioinformatics based structural characterization of glucose dehydrogenase (gdh) gene and growth promoting activity of Leclercia sp. QAU-66.

    PubMed

    Naveed, Muhammad; Ahmed, Iftikhar; Khalid, Nauman; Mumtaz, Abdul Samad

    2014-01-01

    Glucose dehydrogenase (GDH; EC 1.1. 5.2) is the member of quinoproteins group that use the redox cofactor pyrroloquinoline quinoine, calcium ions and glucose as substrate for its activity. In present study, Leclercia sp. QAU-66, isolated from rhizosphere of Vigna mungo, was characterized for phosphate solubilization and the role of GDH in plant growth promotion of Phaseolus vulgaris. The strain QAU-66 had ability to solubilize phosphorus and significantly (p ≤ 0.05) promoted the shoot and root lengths of Phaseolus vulgaris. The structural determination of GDH protein was carried out using bioinformatics tools like Pfam, InterProScan, I-TASSER and COFACTOR. These tools predicted the structural based functional homology of pyrroloquinoline quinone domains in GDH. GDH of Leclercia sp. QAU-66 is one of the main factor that involved in plant growth promotion and provides a solid background for further research in plant growth promoting activities.

  6. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  7. Bioinformatic and functional analysis of RNA secondary structure elements among different genera of human and animal caliciviruses

    PubMed Central

    Simmonds, Peter; Karakasiliotis, Ioannis; Bailey, Dalan; Chaudhry, Yasmin; Evans, David J.; Goodfellow, Ian G.

    2008-01-01

    The mechanism and role of RNA structure elements in the replication and translation of Caliciviridae remains poorly understood. Several algorithmically independent methods were used to predict secondary structures within the Norovirus, Sapovirus, Vesivirus and Lagovirus genera. All showed profound suppression of synonymous site variability (SSSV) at genomic 5′ ends and the start of the sub-genomic (sg) transcript, consistent with evolutionary constraints from underlying RNA structure. A newly developed thermodynamic scanning method predicted RNA folding mapping precisely to regions of SSSV and at the genomic 3′ end. These regions contained several evolutionarily conserved RNA secondary structures, of variable size and positions. However, all caliciviruses contained 3′ terminal hairpins, and stem–loops in the anti-genomic strand invariably six bases upstream of the sg transcript, indicating putative roles as sg promoters. Using the murine norovirus (MNV) reverse-genetics system, disruption of 5′ end stem–loops produced ∼15- to 20-fold infectivity reductions, while disruption of the RNA structure in the sg promoter region and at the 3′ end entirely destroyed replication ability. Restoration of infectivity by repair mutations in the sg promoter region confirmed a functional role for the RNA secondary structure, not the sequence. This study provides comprehensive bioinformatic resources for future functional studies of MNV and other caliciviruses. PMID:18319285

  8. RNA Bioinformatics for Precision Medicine.

    PubMed

    Chen, Jiajia; Shen, Bairong

    2016-01-01

    The high-throughput transcriptomic data generated by deep sequencing technologies urgently require bioinformatics methods for proper data visualization, analysis, storage, and interpretation. The involvement of noncoding RNAs in human diseases highlights their potential as biomarkers and therapeutic targets to facilitate the precision medicine. In this chapter, we give a brief overview of the bioinformatics tools to analyze different aspects of RNAs, in particular ncRNAs. We first describe the emerging bioinformatics methods for RNA identification, structure modeling, functional annotation, and network inference. This is followed by an introduction of potential usefulness of ncRNAs as diagnostic, prognostic biomarkers and therapeutic strategies.

  9. Elongation Factor-Tu (EF-Tu) proteins structural stability and bioinformatics in ancestral gene reconstruction

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Schneider, P.; Lieberman, D.; Holden, T.; Cheung, T.

    2013-09-01

    A paleo-experimental evolution report on elongation factor EF-Tu structural stability results has provided an opportunity to rewind the tape of life using the ancestral protein sequence reconstruction modeling approach; consistent with the book of life dogma in current biology and being an important component in the astrobiology community. Fractal dimension via the Higuchi fractal method and Shannon entropy of the DNA sequence classification could be used in a diagram that serves as a simple summary. Results from biomedical gene research provide examples on the diagram methodology. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, DLG1 in cognitive skill, and HLA-C in mosquito bite immunology with EF Tu DNA sequences have accounted for the reported circular dichroism thermo-stability data systematically; the results also infer a relatively less volatility geologic time period from 2 to 3 Gyr from adaptation viewpoint. Comparison to Thermotoga maritima MSB8 and Psychrobacter shows that Thermus thermophilus HB8 EF-Tu calibration sequence could be an outlier, consistent with free energy calculation by NUPACK. Diagram methodology allows computer simulation studies and HAR1 shows about 0.5% probability from chimp to human in terms of diagram location, and SNP simulation results such as amoebic meningoencephalitis NAF1 suggest correlation. Extensions to the studies of the translation and transcription elongation factor sequences in Megavirus Chiliensis, Megavirus Lba and Pandoravirus show that the studied Pandoravirus sequence could be an outlier with the highest fractal dimension and lowest entropy, as compared to chicken as a deviant in the DNMT3A DNA methylation gene sequences from zebrafish to human and to the less than one percent probability in computer simulation using the HAR1 0.5% probability as reference. The diagram methodology would be useful in ancestral gene

  10. Biochemical characterization of the selenoproteome in Gallus gallus via bioinformatics analysis: structure-function relationships and interactions of binding molecules.

    PubMed

    Zhu, Shi-Yong; Li, Xue-Nan; Sun, Xiao-Chen; Lin, Jia; Li, Wei; Zhang, Cong; Li, Jin-Long

    2017-02-22

    Knowledge about mammalian selenoproteins is increasing. However, the selenoproteome of birds remains considerably less understood, especially concerning its biochemical characterization, structure-function relationships and the interactions of binding molecules. In this work, the SECIS elements, subcellular localization, protein domains and interactions of binding molecules of the selenoproteome in Gallus gallus were analyzed using bioinformatics tools. We carried out comprehensive analyses of the structure-function relationships and interactions of the binding molecules of selenoproteins, to provide biochemical characterization of the selenoproteome in Gallus gallus. Our data provided a wealth of information on the biochemical functions of bird selenoproteins. Members of the selenoproteome were found to be involved in various biological processes in chickens, such as in antioxidants, maintenance of the redox balance, Se transport, and interactions with metals. Six membrane-bound selenoproteins (SelI, SelK, SelS, SelT, DIO1 and DIO3) played important roles in maintaining the membrane integrity. Chicken selenoproteins were classified according to their ligand binding sites as zinc-containing matrix metalloselenoproteins (Sep15, MsrB1, SelW and SelM), POP-containing selenoproteins (GPx1-4), FAD-interacting selenoproteins (TrxR1-3), secretory transport selenoproteins (GPx3 and SelPa) and other selenoproteins. The results of our study provided new evidence for the unknown biological functions of the selenoproteome in birds. Future research is required to confirm the novel biochemical functions of bird selenoproteins.

  11. Minimal Functional Sites in Metalloproteins and Their Usage in Structural Bioinformatics

    PubMed Central

    Rosato, Antonio; Valasatava, Yana; Andreini, Claudia

    2016-01-01

    Metal ions play a functional role in numerous biochemical processes and cellular pathways. Indeed, about 40% of all enzymes of known 3D structure require a metal ion to be able to perform catalysis. The interactions of the metals with the macromolecular framework determine their chemical properties and reactivity. The relevant interactions involve both the coordination sphere of the metal ion and the more distant interactions of the so-called second sphere, i.e., the non-bonded interactions between the macromolecule and the residues coordinating the metal (metal ligands). The metal ligands and the residues in their close spatial proximity define what we call a minimal functional site (MFS). MFSs can be automatically extracted from the 3D structures of metal-binding biological macromolecules deposited in the Protein Data Bank (PDB). They are 3D templates that describe the local environment around a metal ion or metal cofactor and do not depend on the overall macromolecular structure. MFSs provide a different view on metal-binding proteins and nucleic acids, completely focused on the metal. Here we present different protocols and tools based upon the concept of MFS to obtain deeper insight into the structural and functional properties of metal-binding macromolecules. We also show that structure conservation of MFSs in metalloproteins relates to local sequence similarity more strongly than to overall protein similarity. PMID:27153067

  12. DOE EPSCoR Initiative in Structural and computational Biology/Bioinformatics

    SciTech Connect

    Wallace, Susan S.

    2008-02-21

    The overall goal of the DOE EPSCoR Initiative in Structural and Computational Biology was to enhance the competiveness of Vermont research in these scientific areas. To develop self-sustaining infrastructure, we increased the critical mass of faculty, developed shared resources that made junior researchers more competitive for federal research grants, implemented programs to train graduate and undergraduate students who participated in these research areas and provided seed money for research projects. During the time period funded by this DOE initiative: (1) four new faculty were recruited to the University of Vermont using DOE resources, three in Computational Biology and one in Structural Biology; (2) technical support was provided for the Computational and Structural Biology facilities; (3) twenty-two graduate students were directly funded by fellowships; (4) fifteen undergraduate students were supported during the summer; and (5) twenty-eight pilot projects were supported. Taken together these dollars resulted in a plethora of published papers, many in high profile journals in the fields and directly impacted competitive extramural funding based on structural or computational biology resulting in 49 million dollars awarded in grants (Appendix I), a 600% return on investment by DOE, the State and University.

  13. In the Spotlight: Bioinformatics

    PubMed Central

    Wang, May Dongmei

    2016-01-01

    During 2012, next generation sequencing (NGS) has attracted great attention in the biomedical research community, especially for personalized medicine. Also, third generation sequencing has become available. Therefore, state-of-art sequencing technology and analysis are reviewed in this Bioinformatics spotlight on 2012. Next-generation sequencing (NGS) is high-throughput nucleic acid sequencing technology with wide dynamic range and single base resolution. The full promise of NGS depends on the optimization of NGS platforms, sequence alignment and assembly algorithms, data analytics, novel algorithms for integrating NGS data with existing genomic, proteomic, or metabolomic data, and quantitative assessment of NGS technology in comparing to more established technologies such as microarrays. NGS technology has been predicated to become a cornerstone of personalized medicine. It is argued that NGS is a promising field for motivated young researchers who are looking for opportunities in bioinformatics. PMID:23192635

  14. Structural and bioinformatic characterization of an Acinetobacter baumannii type II carrier protein

    SciTech Connect

    Allen, C. Leigh; Gulick, Andrew M.

    2014-06-01

    The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented. Microorganisms produce a variety of natural products via secondary metabolic biosynthetic pathways. Two of these types of synthetic systems, the nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), use large modular enzymes containing multiple catalytic domains in a single protein. These multidomain enzymes use an integrated carrier protein domain to transport the growing, covalently bound natural product to the neighboring catalytic domains for each step in the synthesis. Interestingly, some PKS and NRPS clusters contain free-standing domains that interact intermolecularly with other proteins. Being expressed outside the architecture of a multi-domain protein, these so-called type II proteins present challenges to understand the precise role they play. Additional structures of individual and multi-domain components of the NRPS enzymes will therefore provide a better understanding of the features that govern the domain interactions in these interesting enzyme systems. The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented here. Comparison with the closest structural homologs of other carrier proteins identifies the requirements for a conserved glycine residue and additional important sequence and structural requirements within the regions that interact with partner proteins.

  15. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

    PubMed

    Ezra Tsur, Elishai

    2017-01-01

    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

  16. A new bioinformatics approach to natural protein collections: permutation structure contrasts of viral and cellular systems.

    PubMed

    Graham, Daniel J

    2013-04-01

    Biological cells and viruses operate by different replication and symmetry paradigms. Cells are able to replicate independently and express little spatial symmetry; viruses require cells for replication while manifesting high symmetry. The author inquired whether different paradigms were reflected in the permutations of amino acid sequences. The hypothesis was that the permutation structure level and symmetry within viral protein collections exceed that of living cells. The rationale was that one symmetry aspect generally accompanies and promotes others in a system. The inquiry was readily answered given abundant sequence archives for proteins. The analysis of collections from diverse viral and cellular sources lends strong support. Additional insights into protein primary structure, the design of collections, and the role of information are provided as well.

  17. Cross-React: a new structural bioinformatics method for predicting allergen cross-reactivity.

    PubMed

    Negi, Surendra S; Braun, Werner

    2017-04-01

    The phenomenon of cross-reactivity between allergenic proteins plays an important role to understand how the immune system recognizes different antigen proteins. Allergen proteins are known to cross-react if their sequence comparison shows a high sequence identity which also implies that the proteins have a similar 3D fold. In such cases, linear sequence alignment methods are frequently used to predict cross-reactivity between allergenic proteins. However, the prediction of cross-reactivity between distantly related allergens continues to be a challenging task. To overcome this problem, we developed a new structure-based computational method, Cross-React, to predict cross-reactivity between allergenic proteins available in the Structural Database of Allergens (SDAP). Our method is based on the hypothesis that we can find surface patches on 3D structures of potential allergens with amino acid compositions similar to an epitope in a known allergen. We applied the Cross-React method to a diverse set of seven allergens, and successfully identified several cross-reactive allergens with high to moderate sequence identity which have also been experimentally shown to cross-react. Based on these findings, we suggest that Cross-React can be used as a predictive tool to assess protein allergenicity and cross-reactivity. : Cross-React is available at: http://curie.utmb.edu/Cross-React.html. ssnegi@utmb.edu.

  18. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  19. Bioinformatics analysis of the structural and evolutionary characteristics for toll-like receptor 15

    PubMed Central

    Wang, Jinlan; Chang, Fen

    2016-01-01

    Toll-like receptors (TLRs) play important role in the innate immune system. TLR15 is reported to have a unique role in defense against pathogens, but its structural and evolution characterizations are still poorly understood. In this study, we identified 57 completed TLR15 genes from avian and reptilian genomes. TLR15 clustered into an individual clade and was closely related to family 1 on the phylogenetic tree. Unlike the TLRs in family 1 with the broken asparagine ladders in the middle, TLR15 ectodomain had an intact asparagine ladder that is critical to maintain the overall shape of ectodomain. The conservation analysis found that TLR15 ectodomain had a highly evolutionarily conserved region on the convex surface of LRR11 module, which is probably involved in TLR15 activation process. Furthermore, the protein–protein docking analysis indicated that TLR15 TIR domains have the potential to form homodimers, the predicted interaction interface of TIR dimer was formed mainly by residues from the BB-loops and αC-helixes. Although TLR15 mainly underwent purifying selection, we detected 27 sites under positive selection for TLR15, 24 of which are located on its ectodomain. Our observations suggest the structural features of TLR15 which may be relevant to its function, but which requires further experimental validation. PMID:27257554

  20. Bioinformatics of prokaryotic RNAs.

    PubMed

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.

  1. Bioinformatics of prokaryotic RNAs

    PubMed Central

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  2. [Bioinformatics: a key role in oncology].

    PubMed

    Olivier, Timothée; Chappuis, Pierre; Tsantoulis, Petros

    2016-05-18

    Bioinformatics is essential in clinical oncology and research. Combining biology, computer science and mathematics, bioinformatics aims to derive useful information from clinical and biological data, often poorly structured, at a large scale. Bioinformatics approaches have reclassified certain cancers based on their molecular and biological presentation, improving treatment selection. Many molecular signatures have been developed and, after validation, some are now usable in clinical practice. Other applications could facilitate daily practice, reduce the risk of error and increase the precision of medical decision-making. Bioinformatics must evolve in accordance with ethical considerations and requires multidisciplinary collaboration. Its application depends on a sound technical foundation that meets strict quality requirements.

  3. Comparative structural bioinformatics analysis of Bacillus amyloliquefaciens chemotaxis proteins within Bacillus subtilis group.

    PubMed

    Yssel, Anna; Reva, Oleg; Tastan Bishop, Ozlem

    2011-12-01

    Chemotaxis is a process in which bacteria sense their chemical environment and move towards more favorable conditions. Since plant colonization by bacteria is a multifaceted process which requires a response to the complex chemical environment, a finely tuned and sensitive chemotaxis system is needed. Members of the Bacillus subtilis group including Bacillus amyloliquefaciens are industrially important, for example, as bio-pesticides. The group exhibits plant growth-promoting characteristics, with different specificity towards certain host plants. Therefore, we hypothesize that while the principal molecular mechanisms of bacterial chemotaxis may be conserved, the bacterial chemotaxis system may need an evolutionary tweaking to adapt it to specific requirements, particularly in the process of evolution of free-living soil organisms, towards plant colonization behaviour. To date, almost nothing is known about what parts of the chemotaxis proteins are subjected to positive amino acid substitutions, involved in adjusting the chemotaxis system of bacteria during speciation. In this novel study, positively selected and purified sites of chemotaxis proteins were calculated, and these residues were mapped onto homology models that were built for the chemotaxis proteins, in an attempt to understand the spatial evolution of the chemotaxis proteins. Various positively selected amino acids were identified in semi-conserved regions of the proteins away from the known active sites.

  4. Bioinformatics-Aided Venomics

    PubMed Central

    Kaas, Quentin; Craik, David J.

    2015-01-01

    Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future. PMID:26110505

  5. Comparative transcriptional pathway bioinformatic analysis of dietary restriction, Sir2, p53 and resveratrol life span extension in Drosophila.

    PubMed

    Antosh, Michael; Whitaker, Rachel; Kroll, Adam; Hosier, Suzanne; Chang, Chengyi; Bauer, Johannes; Cooper, Leon; Neretti, Nicola; Helfand, Stephen L

    2011-03-15

    A multiple comparison approach using whole genome transcriptional arrays was used to identify genes and pathways involved in calorie restriction/dietary restriction (DR) life span extension in Drosophila. Starting with a gene centric analysis comparing the changes in common between DR and two DR related molecular genetic life span extending manipulations, Sir2 and p53, lead to a molecular confirmation of Sir2 and p53's similarity with DR and the identification of a small set of commonly regulated genes. One of the identified upregulated genes, takeout, known to be involved in feeding and starvation behavior, and to have sequence homology with Juvenile Hormone (JH) binding protein, was shown to directly extend life span when specifically overexpressed. Here we show that a pathway centric approach can be used to identify shared physiological pathways between DR and Sir2, p53 and resveratrol life span extending interventions. The set of physiological pathways in common among these life span extending interventions provides an initial step toward defining molecular genetic and physiological changes important in life span extension. The large overlap in shared pathways between DR, Sir2, p53 and resveratrol provide strong molecular evidence supporting the genetic studies linking these specific life span extending interventions.

  6. A Bioinformatics Approach for Integrated Transcriptomic and Proteomic Comparative Analyses of Model and Non-sequenced Anopheline Vectors of Human Malaria Parasites*

    PubMed Central

    Mohien, Ceereena Ubaida; Colquhoun, David R.; Mathias, Derrick K.; Gibbons, John G.; Armistead, Jennifer S.; Rodriguez, Maria C.; Rodriguez, Mario Henry; Edwards, Nathan J.; Hartler, Jürgen; Thallinger, Gerhard G.; Graham, David R.; Martinez-Barnetche, Jesus; Rokas, Antonis; Dinglasan, Rhoel R.

    2013-01-01

    Malaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the “model” African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax–An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus. PMID:23082028

  7. A bioinformatics approach for integrated transcriptomic and proteomic comparative analyses of model and non-sequenced anopheline vectors of human malaria parasites.

    PubMed

    Ubaida Mohien, Ceereena; Colquhoun, David R; Mathias, Derrick K; Gibbons, John G; Armistead, Jennifer S; Rodriguez, Maria C; Rodriguez, Mario Henry; Edwards, Nathan J; Hartler, Jürgen; Thallinger, Gerhard G; Graham, David R; Martinez-Barnetche, Jesus; Rokas, Antonis; Dinglasan, Rhoel R

    2013-01-01

    Malaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the "model" African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax-An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus.

  8. Bioinformatic and Comparative Localization of Rab Proteins Reveals Functional Insights into the Uncharacterized GTPases Ypt10p and Ypt11p†

    PubMed Central

    Buvelot Frei, Stéphanie; Rahl, Peter B.; Nussbaum, Maria; Briggs, Benjamin J.; Calero, Monica; Janeczko, Stephanie; Regan, Andrew D.; Chen, Catherine Z.; Barral, Yves; Whittaker, Gary R.; Collins, Ruth N.

    2006-01-01

    A striking characteristic of a Rab protein is its steady-state localization to the cytosolic surface of a particular subcellular membrane. In this study, we have undertaken a combined bioinformatic and experimental approach to examine the evolutionary conservation of Rab protein localization. A comprehensive primary sequence classification shows that 10 out of the 11 Rab proteins identified in the yeast (Saccharomyces cerevisiae) genome can be grouped within a major subclass, each comprising multiple Rab orthologs from diverse species. We compared the locations of individual yeast Rab proteins with their localizations following ectopic expression in mammalian cells. Our results suggest that green fluorescent protein-tagged Rab proteins maintain localizations across large evolutionary distances and that the major known player in the Rab localization pathway, mammalian Rab-GDI, is able to function in yeast. These findings enable us to provide insight into novel gene functions and classify the uncharacterized Rab proteins Ypt10p (YBR264C) as being involved in endocytic function and Ypt11p (YNL304W) as being localized to the endoplasmic reticulum, where we demonstrate it is required for organelle inheritance. PMID:16980630

  9. Bioinformatic and comparative localization of Rab proteins reveals functional insights into the uncharacterized GTPases Ypt10p and Ypt11p.

    PubMed

    Buvelot Frei, Stéphanie; Rahl, Peter B; Nussbaum, Maria; Briggs, Benjamin J; Calero, Monica; Janeczko, Stephanie; Regan, Andrew D; Chen, Catherine Z; Barral, Yves; Whittaker, Gary R; Collins, Ruth N

    2006-10-01

    A striking characteristic of a Rab protein is its steady-state localization to the cytosolic surface of a particular subcellular membrane. In this study, we have undertaken a combined bioinformatic and experimental approach to examine the evolutionary conservation of Rab protein localization. A comprehensive primary sequence classification shows that 10 out of the 11 Rab proteins identified in the yeast (Saccharomyces cerevisiae) genome can be grouped within a major subclass, each comprising multiple Rab orthologs from diverse species. We compared the locations of individual yeast Rab proteins with their localizations following ectopic expression in mammalian cells. Our results suggest that green fluorescent protein-tagged Rab proteins maintain localizations across large evolutionary distances and that the major known player in the Rab localization pathway, mammalian Rab-GDI, is able to function in yeast. These findings enable us to provide insight into novel gene functions and classify the uncharacterized Rab proteins Ypt10p (YBR264C) as being involved in endocytic function and Ypt11p (YNL304W) as being localized to the endoplasmic reticulum, where we demonstrate it is required for organelle inheritance.

  10. Probing Medin Monomer Structure and its Amyloid Nucleation Using 13C-Direct Detection NMR in Combination with Structural Bioinformatics

    PubMed Central

    Davies, Hannah A.; Rigden, Daniel J.; Phelan, Marie M.; Madine, Jillian

    2017-01-01

    Aortic medial amyloid is the most prevalent amyloid found to date, but remarkably little is known about it. It is characterised by aberrant deposition of a 5.4 kDa protein called medin within the medial layer of large arteries. Here we employ a combined approach of ab initio protein modelling and 13C-direct detection NMR to generate a model for soluble monomeric medin comprising a stable core of three β-strands and shorter more labile strands at the termini. Molecular dynamics simulations suggested that detachment of the short, C-terminal β-strand from the soluble fold exposes key amyloidogenic regions as a potential site of nucleation enabling dimerisation and subsequent fibril formation. This mechanism resembles models proposed for several other amyloidogenic proteins suggesting that despite variations in sequence and protomer structure these proteins may share a common pathway for amyloid nucleation and subsequent protofibril and fibril formation. PMID:28327552

  11. Bioinformatics and Moonlighting Proteins

    PubMed Central

    Hernández, Sergio; Franco, Luís; Calvo, Alejandra; Ferragut, Gabriela; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2015-01-01

    Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein–protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations – it requires the existence of multialigned family protein sequences – but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses. PMID:26157797

  12. Comparing function and structure between entire proteomes

    PubMed Central

    Liu, Jinfeng; Rost, Burkhard

    2001-01-01

    More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that ∼15%–30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%–5%), and we predicted ∼15%–25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%–40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for ∼20%–45% of all proteins; the regions with structural homology covered 20%–30% of all residues. These numbers may or may not suggest that there are 1200–2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes. PMID:11567088

  13. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database.

  15. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  16. High-resolution modeling of antibody structures by a combination of bioinformatics, expert knowledge, and molecular simulations.

    PubMed

    Shirai, Hiroki; Ikeda, Kazuyoshi; Yamashita, Kazuo; Tsuchiya, Yuko; Sarmiento, Jamica; Liang, Shide; Morokata, Tatsuaki; Mizuguchi, Kenji; Higo, Junichi; Standley, Daron M; Nakamura, Haruki

    2014-08-01

    In the second antibody modeling assessment, we used a semiautomated template-based structure modeling approach for 11 blinded antibody variable region (Fv) targets. The structural modeling method involved several steps, including template selection for framework and canonical structures of complementary determining regions (CDRs), homology modeling, energy minimization, and expert inspection. The submitted models for Fv modeling in Stage 1 had the lowest average backbone root mean square deviation (RMSD) (1.06 Å). Comparison to crystal structures showed the most accurate Fv models were generated for 4 out of 11 targets. We found that the successful modeling in Stage 1 mainly was due to expert-guided template selection for CDRs, especially for CDR-H3, based on our previously proposed empirical method (H3-rules) and the use of position specific scoring matrix-based scoring. Loop refinement using fragment assembly and multicanonical molecular dynamics (McMD) was applied to CDR-H3 loop modeling in Stage 2. Fragment assembly and McMD produced putative structural ensembles with low free energy values that were scored based on the OSCAR all-atom force field and conformation density in principal component analysis space, respectively, as well as the degree of consensus between the two sampling methods. The quality of 8 out of 10 targets improved as compared with Stage 1. For 4 out of 10 Stage-2 targets, our method generated top-scoring models with RMSD values of less than 1 Å. In this article, we discuss the strengths and weaknesses of our approach as well as possible directions for improvement to generate better predictions in the future.

  17. Bioinformatics and Cancer

    Cancer.gov

    Researchers take on challenges and opportunities to mine "Big Data" for answers to complex biological questions. Learn how bioinformatics uses advanced computing, mathematics, and technological platforms to store, manage, analyze, and understand data.

  18. Deep learning in bioinformatics.

    PubMed

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2016-07-29

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.

  19. Comprehensive bioinformatics analysis of Mycoplasma pneumoniae genomes to investigate underlying population structure and type-specific determinants.

    PubMed

    Diaz, Maureen H; Desai, Heta P; Morrison, Shatavia S; Benitez, Alvaro J; Wolff, Bernard J; Caravas, Jason; Read, Timothy D; Dean, Deborah; Winchell, Jonas M

    2017-01-01

    Mycoplasma pneumoniae is a significant cause of respiratory illness worldwide. Despite a minimal and highly conserved genome, genetic diversity within the species may impact disease. We performed whole genome sequencing (WGS) analysis of 107 M. pneumoniae isolates, including 67 newly sequenced using the Pacific BioSciences RS II and/or Illumina MiSeq sequencing platforms. Comparative genomic analysis of 107 genomes revealed >3,000 single nucleotide polymorphisms (SNPs) in total, including 520 type-specific SNPs. Population structure analysis supported the existence of six distinct subgroups, three within each type. We developed a predictive model to classify an isolate based on whole genome SNPs called against the reference genome into the identified subtypes, obviating the need for genome assembly. This study is the most comprehensive WGS analysis for M. pneumoniae to date, underscoring the power of combining complementary sequencing technologies to overcome difficult-to-sequence regions and highlighting potential differential genomic signatures in M. pneumoniae.

  20. Global computing for bioinformatics.

    PubMed

    Loewe, Laurence

    2002-12-01

    Global computing, the collaboration of idle PCs via the Internet in a SETI@home style, emerges as a new way of massive parallel multiprocessing with potentially enormous CPU power. Its relations to the broader, fast-moving field of Grid computing are discussed without attempting a review of the latter. This review (i) includes a short table of milestones in global computing history, (ii) lists opportunities global computing offers for bioinformatics, (iii) describes the structure of problems well suited for such an approach, (iv) analyses the anatomy of successful projects and (v) points to existing software frameworks. Finally, an evaluation of the various costs shows that global computing indeed has merit, if the problem to be solved is already coded appropriately and a suitable global computing framework can be found. Then, either significant amounts of computing power can be recruited from the general public, or--if employed in an enterprise-wide Intranet for security reasons--idle desktop PCs can substitute for an expensive dedicated cluster.

  1. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  2. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  3. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  4. Comparative Protein Structure Modeling Using MODELLER

    PubMed Central

    Webb, Benjamin; Sali, Andrej

    2016-01-01

    Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. PMID:27322406

  5. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  6. GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

    PubMed Central

    Li, Fuyi; Li, Chen; Revote, Jerico; Zhang, Yang; Webb, Geoffrey I.; Li, Jian; Song, Jiangning; Lithgow, Trevor

    2016-01-01

    Glycosylation plays an important role in cell-cell adhesion, ligand-binding and subcellular recognition. Current approaches for predicting protein glycosylation are primarily based on sequence-derived features, while little work has been done to systematically assess the importance of structural features to glycosylation prediction. Here, we propose a novel bioinformatics method called GlycoMinestruct(http://glycomine.erc.monash.edu/Lab/GlycoMine_Struct/) for improved prediction of human N- and O-linked glycosylation sites by combining sequence and structural features in an integrated computational framework with a two-step feature-selection strategy. Experiments indicated that GlycoMinestruct outperformed NGlycPred, the only predictor that incorporated both sequence and structure features, achieving AUC values of 0.941 and 0.922 for N- and O-linked glycosylation, respectively, on an independent test dataset. We applied GlycoMinestruct to screen the human structural proteome and obtained high-confidence predictions for N- and O-linked glycosylation sites. GlycoMinestruct can be used as a powerful tool to expedite the discovery of glycosylation events and substrates to facilitate hypothesis-driven experimental studies. PMID:27708373

  7. Chemistry in Bioinformatics

    PubMed Central

    Murray-Rust, Peter; Mitchell, John BO; Rzepa, Henry S

    2005-01-01

    Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols. PMID:15941476

  8. An Online Bioinformatics Curriculum

    PubMed Central

    Searls, David B.

    2012-01-01

    Online learning initiatives over the past decade have become increasingly comprehensive in their selection of courses and sophisticated in their presentation, culminating in the recent announcement of a number of consortium and startup activities that promise to make a university education on the internet, free of charge, a real possibility. At this pivotal moment it is appropriate to explore the potential for obtaining comprehensive bioinformatics training with currently existing free video resources. This article presents such a bioinformatics curriculum in the form of a virtual course catalog, together with editorial commentary, and an assessment of strengths, weaknesses, and likely future directions for open online learning in this field. PMID:23028269

  9. Glossary of bioinformatics terms.

    PubMed

    2007-06-01

    This collection of terms and definitions commonly encountered in the bioinformatics literature will be updated periodically as Current Protocols in Bioinformatics grows. In addition, an extensive glossary of genetic terms can be found on the Web site of the National Human Genome Research Institute (http://www.genome.gov/glossary.cfm). The entries in that online glossary provide a brief written definition of the term; the user can also listen to an informative explanation of the term using RealAudio or the Windows Media Player.

  10. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for

  11. Bioinformatics and School Biology

    ERIC Educational Resources Information Center

    Dalpech, Roger

    2006-01-01

    The rapidly changing field of bioinformatics is fuelling the need for suitably trained personnel with skills in relevant biological "sub-disciplines" such as proteomics, transcriptomics and metabolomics, etc. But because of the complexity--and sheer weight of data--associated with these new areas of biology, many school teachers feel…

  12. Teaching bioinformatics to engineers.

    PubMed

    Mihalas, George I; Tudor, Anca; Paralescu, Sorin; Andor, Minodora; Stoicu-Tivadar, Lacramioara

    2014-01-01

    The paper refers to our methodology and experience in establishing the content of the course in bioinformatics introduced to the school of "Information Systems in Healthcare" (SIIS), master level. The syllabi of both lectures and laboratory works are presented and discussed.

  13. Comparative protein structure modeling using MODELLER.

    PubMed

    Eswar, Narayanan; Webb, Ben; Marti-Renom, Marc A; Madhusudhan, M S; Eramian, David; Shen, Min-Yi; Pieper, Ursula; Sali, Andrej

    2007-11-01

    Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. (c) 2007 by John Wiley & Sons, Inc.

  14. Systematic, map-scale, comparative structural geology

    SciTech Connect

    Groshong, R.H. Jr.

    1985-01-01

    Interpretation by analogy is the basis of comparative structural geology. A systematic approach to analog selection aids in efficiency and in understanding. The basic interpretive unit for analog selection is the structural family: a map-scale assemblage of genetically related structural forms produced by deformation with approximately constant boundary conditions. A family is specified by the dominant component of its displacement field and by structural levels involved. The differential vertical displacement category includes intrusive and impact structures. The three important basement types are isotropic crystalline, quasisedimentary and metamorphosing. A family is either thin skinned or involves cover plus one of the three basement types. These parameters are arranged into a matrix to produce 20 pigeon holes. Some structures do not fall exactly into one pigeon hole. Other structures link two families; for example, gravity glide links thin-skinned extension and contraction. This system is analogous to end-member rock classifications. Not every example is an end member, but the concept of end members greatly speeds up comparative analysis and clarifies the choice of analogies. Future research will lead to better definition of the key characteristics of certain families, the relationships between families, and the possible existence of additional families.

  15. An Inquiry into Protein Structure and Genetic Disease: Introducing Undergraduates to Bioinformatics in a Large Introductory Course

    ERIC Educational Resources Information Center

    Bednarski, April E.; Elgin, Sarah C. R.; Pakrasi, Himadri B.

    2005-01-01

    This inquiry-based lab is designed around genetic diseases with a focus on protein structure and function. To allow students to work on their own investigatory projects, 10 projects on 10 different proteins were developed. Students are grouped in sections of 20 and work in pairs on each of the projects. To begin their investigation, students are…

  16. An Inquiry into Protein Structure and Genetic Disease: Introducing Undergraduates to Bioinformatics in a Large Introductory Course

    ERIC Educational Resources Information Center

    Bednarski, April E.; Elgin, Sarah C. R.; Pakrasi, Himadri B.

    2005-01-01

    This inquiry-based lab is designed around genetic diseases with a focus on protein structure and function. To allow students to work on their own investigatory projects, 10 projects on 10 different proteins were developed. Students are grouped in sections of 20 and work in pairs on each of the projects. To begin their investigation, students are…

  17. Bioinformatics Analysis Reveals Abundant Short Alpha-Helices as a Common Structural Feature of Oomycete RxLR Effector Proteins

    PubMed Central

    Ye, Wenwu; Wang, Yang; Wang, Yuanchao

    2015-01-01

    RxLR effectors represent one of the largest and most diverse effector families in oomycete plant pathogens. These effectors have attracted enormous attention since they can be delivered inside the plant cell and manipulates host immunity. With the exceptions of a signal peptide and the following RxLR-dEER and C-terminal W/Y/L motifs identified from the sequences themselves, nearly no functional domains have been found. Recently, protein structures of several RxLRs were revealed to comprise alpha-helical bundle repeats. However, approximately half of all RxLRs lack obvious W/Y/L motifs, which are associated with helical structures. In this study, secondary structure prediction of the putative RxLR proteins was performed. We found that the C-terminus of the majority of these RxLR proteins, irrespective of the presence of W/Y/L motifs, contains abundant short alpha-helices. Since a large-scale experimental determination of protein structures has been difficult to date, results of the current study extend our understanding on the oomycete RxLR effectors in protein secondary structures from individual members to the entire family. Moreover, we identified less alpha-helix-rich proteins from secretomes of several oomycete and fungal organisms in which RxLRs have not been identified, providing additional evidence that these organisms are unlikely to harbor RxLR-like proteins. Therefore, these results provide additional information that will aid further studies on the evolution and functional mechanisms of RxLR effectors. PMID:26252511

  18. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  19. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  20. Bioinformatics Reveal Five Lineages of Oleosins and the Mechanism of Lineage Evolution Related to Structure/Function from Green Algae to Seed Plants1[OPEN

    PubMed Central

    Huang, Ming-Der; Huang, Anthony H.C.

    2015-01-01

    Plant cells contain subcellular lipid droplets with a triacylglycerol matrix enclosed by a layer of phospholipids and the small structural protein oleosin. Oleosins possess a conserved central hydrophobic hairpin of approximately 72 residues penetrating into the lipid droplet matrix and amphipathic amino- and carboxyl (C)-terminal peptides lying on the phospholipid surface. Bioinformatics of 1,000 oleosins of green algae and all plants emphasizing biological implications reveal five oleosin lineages: primitive (in green algae, mosses, and ferns), universal (U; all land plants), and three in specific organs or phylogenetic groups, termed seed low-molecular-weight (SL; seed plants), seed high-molecular-weight (SH; angiosperms), and tapetum (T; Brassicaceae) oleosins. Transition from one lineage to the next is depicted from lineage intermediates at junctions of phylogeny and organ distributions. Within a species, each lineage, except the T oleosin lineage, has one to four genes per haploid genome, only approximately two of which are active. Primitive oleosins already possess all the general characteristics of oleosins. U oleosins have C-terminal sequences as highly conserved as the hairpin sequences; thus, U oleosins including their C-terminal peptide exert indispensable, unknown functions. SL and SH oleosin transcripts in seeds are in an approximately 1:1 ratio, which suggests the occurrence of SL-SH oleosin dimers/multimers. T oleosins in Brassicaceae are encoded by rapidly evolved multitandem genes for alkane storage and transfer. Overall, oleosins have evolved to retain conserved hairpin structures but diversified for unique structures and functions in specific cells and plant families. Also, our studies reveal oleosin in avocado (Persea americana) mesocarp and no acyltransferase/lipase motifs in most oleosins. PMID:26232488

  1. Bioinformatics Reveal Five Lineages of Oleosins and the Mechanism of Lineage Evolution Related to Structure/Function from Green Algae to Seed Plants.

    PubMed

    Huang, Ming-Der; Huang, Anthony H C

    2015-09-01

    Plant cells contain subcellular lipid droplets with a triacylglycerol matrix enclosed by a layer of phospholipids and the small structural protein oleosin. Oleosins possess a conserved central hydrophobic hairpin of approximately 72 residues penetrating into the lipid droplet matrix and amphipathic amino- and carboxyl (C)-terminal peptides lying on the phospholipid surface. Bioinformatics of 1,000 oleosins of green algae and all plants emphasizing biological implications reveal five oleosin lineages: primitive (in green algae, mosses, and ferns), universal (U; all land plants), and three in specific organs or phylogenetic groups, termed seed low-molecular-weight (SL; seed plants), seed high-molecular-weight (SH; angiosperms), and tapetum (T; Brassicaceae) oleosins. Transition from one lineage to the next is depicted from lineage intermediates at junctions of phylogeny and organ distributions. Within a species, each lineage, except the T oleosin lineage, has one to four genes per haploid genome, only approximately two of which are active. Primitive oleosins already possess all the general characteristics of oleosins. U oleosins have C-terminal sequences as highly conserved as the hairpin sequences; thus, U oleosins including their C-terminal peptide exert indispensable, unknown functions. SL and SH oleosin transcripts in seeds are in an approximately 1:1 ratio, which suggests the occurrence of SL-SH oleosin dimers/multimers. T oleosins in Brassicaceae are encoded by rapidly evolved multitandem genes for alkane storage and transfer. Overall, oleosins have evolved to retain conserved hairpin structures but diversified for unique structures and functions in specific cells and plant families. Also, our studies reveal oleosin in avocado (Persea americana) mesocarp and no acyltransferase/lipase motifs in most oleosins. © 2015 American Society of Plant Biologists. All Rights Reserved.

  2. Bioinformatics for Exploration

    NASA Technical Reports Server (NTRS)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  3. Distributed computing in bioinformatics.

    PubMed

    Jain, Eric

    2002-01-01

    This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.

  4. Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers.

    PubMed

    Schneider, Maria V; Walter, Peter; Blatter, Marie-Claude; Watson, James; Brazas, Michelle D; Rother, Kristian; Budd, Aidan; Via, Allegra; van Gelder, Celia W G; Jacob, Joachim; Fernandes, Pedro; Nyrönen, Tommi H; De Las Rivas, Javier; Blicher, Thomas; Jimenez, Rafael C; Loveland, Jane; McDowall, Jennifer; Jones, Phil; Vaughan, Brendan W; Lopez, Rodrigo; Attwood, Teresa K; Brooksbank, Catherine

    2012-05-01

    Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of 'high-throughput biology', the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.

  5. Highlighting computations in bioscience and bioinformatics: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB07).

    PubMed

    Lu, Guoqing; Ni, Jun

    2008-05-28

    The Second Symposium on Computations in Bioinformatics and Bioscience (SCBB07) was held in Iowa City, Iowa, USA, on August 13-15, 2007. This annual event attracted dozens of bioinformatics professionals and students, who are interested in solving emerging computational problems in bioscience, from China, Japan, Taiwan and the United States. The Scientific Committee of the symposium selected 18 peer-reviewed papers for publication in this supplemental issue of BMC Bioinformatics. These papers cover a broad spectrum of topics in computational biology and bioinformatics, including DNA, protein and genome sequence analysis, gene expression and microarray analysis, computational proteomics and protein structure classification, systems biology and machine learning.

  6. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  7. EST-SSR markers from five sequenced cDNA libraries of common bean (Phaseolus vulgaris L.) comparing three bioinformatic algorithms.

    PubMed

    Blair, Matthew W; Hurtado, Natalia

    2013-07-01

    Expressed sequence tags (ESTs) are a rich source of SSR sequences, but the proportion of long Class I microsatellites with many repeats vs. short Class II microsatellites with few repeats is an important factor to consider. Class I microsatellites, with more than 20 bp of repeats, tend to make better markers with higher polymorphism. The goal of this study was to determine the frequency of Class I and Class II microsatellites in a collection of over 21,000 ESTs from a single study of five different tissues of common bean: two types of leaves, nodules, pods and roots. For this objective, we used three different bioinformatics pipelines: Automated Microsatellite Marker Development (AMMD), Batchprimer3 and SSRLocator. In addition, we determined the frequency of single or multiple SSRs in the assembled ESTs, the frequency of perfect and compound repeats and whether Class I microsatellites were mainly di-nucleotide or tri-nucleotide motifs with each of the search engines. Primers were designed for a total of 175 microsatellites concentrating on class I microsatellites identified with SSR locator. A few other microsatellites were included from the other search engines, AMMD and Batchprimer3 programs so as to have a representative set of class II markers for comparison sake. The comparison of 95 class I vs. 80 class II markers confirmed that the Class I were more polymorphic and therefore more useful. © 2013 John Wiley & Sons Ltd.

  8. Bioinformatics pipeline for functional identification and characterization of proteins

    NASA Astrophysics Data System (ADS)

    Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; Krzywkowski, Tomasz; Świerkula, Katarzyna; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    The new sequencing methods, called Next Generation Sequencing gives an opportunity to possess a vast amount of data in short time. This data requires structural and functional annotation. Functional identification and characterization of predicted proteins could be done by in silico approches, thanks to a numerous computational tools available nowadays. However, there is a need to confirm the results of proteins function prediction using different programs and comparing the results or confirm experimentally. Here we present a bioinformatics pipeline for structural and functional annotation of proteins.

  9. A Guide to Bioinformatics for Immunologists

    PubMed Central

    Whelan, Fiona J.; Yap, Nicholas V. L.; Surette, Michael G.; Golding, G. Brian; Bowdish, Dawn M. E.

    2013-01-01

    Bioinformatics includes a suite of methods, which are cheap, approachable, and many of which are easily accessible without any sort of specialized bioinformatic training. Yet, despite this, bioinformatic tools are under-utilized by immunologists. Herein, we review a representative set of publicly available, easy-to-use bioinformatic tools using our own research on an under-annotated human gene, SCARA3, as an example. SCARA3 shares an evolutionary relationship with the class A scavenger receptors, but preliminary research showed that it was divergent enough that its function remained unclear. In our quest for more information about this gene – did it share gene sequence similarities to other scavenger receptors? Did it contain conserved protein domains? Where was it expressed in the human body? – we discovered the power and informative potential of publicly available bioinformatic tools designed for the novice in mind, which allowed us to hypothesize on the regulation, structure, and function of this protein. We argue that these tools are largely applicable to many facets of immunology research. PMID:24363654

  10. A survey of scholarly literature describing the field of bioinformatics education and bioinformatics educational research.

    PubMed

    Magana, Alejandra J; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students' attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. © 2014 A. J. Magana et al. CBE—Life Sciences Education © 2014 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  11. A Bioinformatics Reference Model: Towards a Framework for Developing and Organising Bioinformatic Resources

    NASA Astrophysics Data System (ADS)

    Hiew, Hong Liang; Bellgard, Matthew

    2007-11-01

    Life Science research faces the constant challenge of how to effectively handle an ever-growing body of bioinformatics software and online resources. The users and developers of bioinformatics resources have a diverse set of competing demands on how these resources need to be developed and organised. Unfortunately, there does not exist an adequate community-wide framework to integrate such competing demands. The problems that arise from this include unstructured standards development, the emergence of tools that do not meet specific needs of researchers, and often times a communications gap between those who use the tools and those who supply them. This paper presents an overview of the different functions and needs of bioinformatics stakeholders to determine what may be required in a community-wide framework. A Bioinformatics Reference Model is proposed as a basis for such a framework. The reference model outlines the functional relationship between research usage and technical aspects of bioinformatics resources. It separates important functions into multiple structured layers, clarifies how they relate to each other, and highlights the gaps that need to be addressed for progress towards a diverse, manageable, and sustainable body of resources. The relevance of this reference model to the bioscience research community, and its implications in progress for organising our bioinformatics resources, are discussed.

  12. Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars

    PubMed Central

    2011-01-01

    Background The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales. Results We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction. Conclusions PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms. PMID:21501497

  13. Improvement of Student Understanding of How Kinetic Data Facilitates the Determination of Amino Acid Catalytic Function through an Alkaline Phosphatase Structure/Mechanism Bioinformatics Exercise

    ERIC Educational Resources Information Center

    Grunwald, Sandra K.; Krueger, Katherine J.

    2008-01-01

    Laboratory exercises, which utilize alkaline phosphatase as a model enzyme, have been developed and used extensively in undergraduate biochemistry courses to illustrate enzyme steady-state kinetics. A bioinformatics laboratory exercise for the biochemistry laboratory, which complements the traditional alkaline phosphatase kinetics exercise, was…

  14. Improvement of Student Understanding of How Kinetic Data Facilitates the Determination of Amino Acid Catalytic Function through an Alkaline Phosphatase Structure/Mechanism Bioinformatics Exercise

    ERIC Educational Resources Information Center

    Grunwald, Sandra K.; Krueger, Katherine J.

    2008-01-01

    Laboratory exercises, which utilize alkaline phosphatase as a model enzyme, have been developed and used extensively in undergraduate biochemistry courses to illustrate enzyme steady-state kinetics. A bioinformatics laboratory exercise for the biochemistry laboratory, which complements the traditional alkaline phosphatase kinetics exercise, was…

  15. Pattern recognition in bioinformatics.

    PubMed

    de Ridder, Dick; de Ridder, Jeroen; Reinders, Marcel J T

    2013-09-01

    Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

  16. Fold assessment for comparative protein structure modeling.

    PubMed

    Melo, Francisco; Sali, Andrej

    2007-11-01

    Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.

  17. Fold assessment for comparative protein structure modeling

    PubMed Central

    Melo, Francisco; Sali, Andrej

    2007-01-01

    Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences. PMID:17905832

  18. [Comparative hierarchic structure of the genetic language].

    PubMed

    Ratner, V A

    1993-05-01

    The genetical texts and genetic language are built according to hierarchic principle and contain no less than 6 levels of coding sequences, separated by marks of punctuation, separation and indication: codons, cistrons, scriptons, replicons, linkage groups, genomes. Each level has all the attributes of the language. This hierarchic system expresses some general properties and regularities. The rules of genetic language being determined, the variability of genetical texts is generated by block-modular combinatorics on each level. Between levels there are some intermediate sublevels and module types capable of being combined. The genetic language is compared with two different independent linguistic systems: human natural languages and artificial programming languages. Genetic language is a natural one by its origin, but it is a typical technical language of the functioning genetic regulatory system--by its predestination. All three linguistic systems under comparison have evident similarity of the organization principles and hierarchical structures. This argues for similarity of their principles of appearance and evolution.

  19. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  20. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  1. Bioinformatics meets parasitology.

    PubMed

    Cantacessi, C; Campbell, B E; Jex, A R; Young, N D; Hall, R S; Ranganathan, S; Gasser, R B

    2012-05-01

    The advent and integration of high-throughput '-omics' technologies (e.g. genomics, transcriptomics, proteomics, metabolomics, glycomics and lipidomics) are revolutionizing the way biology is done, allowing the systems biology of organisms to be explored. These technologies are now providing unique opportunities for global, molecular investigations of parasites. For example, studies of a transcriptome (all transcripts in an organism, tissue or cell) have become instrumental in providing insights into aspects of gene expression, regulation and function in a parasite, which is a major step to understanding its biology. The purpose of this article was to review recent applications of next-generation sequencing technologies and bioinformatic tools to large-scale investigations of the transcriptomes of parasitic nematodes of socio-economic significance (particularly key species of the order Strongylida) and to indicate the prospects and implications of these explorations for developing novel methods of parasite intervention.

  2. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  3. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  4. Chapter 16: text mining for translational bioinformatics.

    PubMed

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  5. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  6. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  7. Microbial bioinformatics 2020.

    PubMed

    Pallen, Mark J

    2016-09-01

    Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! © 2016 The Author. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  8. Comparing Factor Structures of Adolescent Psychopathology

    ERIC Educational Resources Information Center

    Verona, Edelyn; Javdani, Shabnam; Sprague, Jenessa

    2011-01-01

    Research on the structure of adolescent psychopathology can provide information on broad factors that underlie different forms of maladjustment in youths. Multiple studies from the literature on adult populations suggest that 2 factors, Internalizing and Externalizing, meaningfully comprise the factor structure of adult psychopathology (e.g.,…

  9. Comparing Factor Structures of Adolescent Psychopathology

    ERIC Educational Resources Information Center

    Verona, Edelyn; Javdani, Shabnam; Sprague, Jenessa

    2011-01-01

    Research on the structure of adolescent psychopathology can provide information on broad factors that underlie different forms of maladjustment in youths. Multiple studies from the literature on adult populations suggest that 2 factors, Internalizing and Externalizing, meaningfully comprise the factor structure of adult psychopathology (e.g.,…

  10. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  11. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  12. Bioinformatic pipelines in Python with Leaf.

    PubMed

    Napolitano, Francesco; Mariani-Costantini, Renato; Tagliaferri, Roberto

    2013-06-21

    An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user's Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools.

  13. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  14. Implementing a web-based introductory bioinformatics course for non-bioinformaticians that incorporates practical exercises.

    PubMed

    Vincent, Antony T; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J; Lagüe, Patrick

    2017-09-13

    A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become specialized. However, biology students who are not involved in a bioinformatics program also need a solid toolbox of bioinformatics software and skills. Therefore, we have developed a completely online bioinformatics course for non-bioinformaticians, entitled "BIF-1901 Introduction à la bio-informatique et à ses outils (Introduction to bioinformatics and bioinformatics tools)," given by the Department of Biochemistry, Microbiology, and Bioinformatics of Université Laval (Quebec City, Canada). This course requires neither a bioinformatics background nor specific skills in informatics. The underlying main goal was to produce a completely online up-to-date bioinformatics course, including practical exercises, with an intuitive pedagogical framework. The course, BIF-1901, was conceived to cover the three fundamental aspects of bioinformatics: (1) informatics, (2) biological sequence analysis, and (3) structural bioinformatics. This article discusses the content of the modules, the evaluations, the pedagogical framework, and the challenges inherent to a multidisciplinary, fully online course. © 2017 by The International Union of Biochemistry and Molecular Biology, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.

  15. The Bioinformatics Analysis of Comparative Genomics of Mycobacterium tuberculosis Complex (MTBC) Provides Insight into Dissimilarities between Intraspecific Groups Differing in Host Association, Virulence, and Epitope Diversity

    PubMed Central

    Jia, Xinmiao; Yang, Li; Dong, Mengxing; Chen, Suting; Lv, Lingna; Cao, Dandan; Fu, Jing; Yang, Tingting; Zhang, Ju; Zhang, Xiangli; Shang, Yuanyuan; Wang, Guirong; Sheng, Yongjie; Huang, Hairong; Chen, Fei

    2017-01-01

    Tuberculosis now exceeds HIV as the top infectious disease cause of mortality, and is caused by the Mycobacterium tuberculosis complex (MTBC). MTBC strains have highly conserved genome sequences (similarity >99%) but dramatically different phenotypes. To analyze the relationship between genotype and phenotype, we conducted the comparative genomic analysis on 12 MTBC strains representing different lineages (i.e., Mycobacterium bovis; M. bovis BCG; M. microti; M. africanum; M. tuberculosis H37Rv; M. tuberculosis H37Ra, and six M. tuberculosis clinical isolates). The analysis focused on the three aspects of pathogenicity: host association, virulence, and epitope variations. Host association analysis indicated that eight mce3 genes, two enoyl-CoA hydratases, and five PE/PPE family genes were present only in human isolates; these may have roles in host-pathogen interactions. There were 15 SNPs found on virulence factors (including five SNPs in three ESX secretion proteins) only in the Beijing strains, which might be related to their more virulent phenotype. A comparison between the virulent H37Rv and non-virulent H37Ra strains revealed three SNPs that were likely associated with the virulence attenuation of H37Ra: S219L (PhoP), A219E (MazG) and a newly identified I228M (EspK). Additionally, a comparison of animal-associated MTBC strains showed that the deletion of the first four genes (i.e., pe35, ppe68, esxB, esxA), rather than all eight genes of RD1, might play a central role in the virulence attenuation of animal isolates. Finally, by comparing epitopes among MTBC strains, we found that four epitopes were lost only in the Beijing strains; this may render them better capable of evading the human immune system, leading to enhanced virulence. Overall, our comparative genomic analysis of MTBC strains reveals the relationship between the highly conserved genotypes and the diverse phenotypes of MTBC, provides insight into pathogenic mechanisms, and facilitates the

  16. An intelligent system for comparing protein structures

    SciTech Connect

    Benatan, E.

    1994-12-31

    An approach to protein structure comparison is presented which uses techniques of artificial intelligence (AI) to generate a mapping between two protein structures. The approach proceeds by first identifying the seed of a possible mapping, and then searching for ways to extend the seed by incorporating corresponding elements from the two proteins. Correspondence is judged using heuristic functions which assess the similarity of the structural environments of the elements. The search can be guided by separately encoded knowledge. A prototype has been implemented which is able to rapidly create mappings with a high degree of accuracy in test cases.

  17. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  18. Integration of bioinformatics to biodegradation

    PubMed Central

    2014-01-01

    Bioinformatics and biodegradation are two primary scientific fields in applied microbiology and biotechnology. The present review describes development of various bioinformatics tools that may be applied in the field of biodegradation. Several databases, including the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology Database (Bionemo) MetaCyc, and BioCyc have been developed to enable access to information related to biochemistry and genetics of microbial degradation. In addition, several bioinformatics tools for predicting toxicity and biodegradation of chemicals have been developed. Furthermore, the whole genomes of several potential degrading bacteria have been sequenced and annotated using bioinformatics tools. PMID:24808763

  19. Comparative proteomic and bioinformatic analysis of the effects of a high-grain diet on the hepatic metabolism in lactating dairy goats.

    PubMed

    Jiang, Xueyuan; Zeng, Tao; Zhang, Shukun; Zhang, Yuanshu

    2013-01-01

    To gain insight on the impart of high-grain diets on liver metabolism in ruminants, we employed a comparative proteomic approach to investigate the proteome-wide effects of diet in lactating dairy goats by conducting a proteomic analysis of the liver extracts of 10 lactating goats fed either a control diet or a high-grain diet. More than 500 protein spots were detected per condition by two-dimensional electrophoresis (2-DE). In total, 52 differentially expressed spots (≥2.0-fold changed) were excised and analyzed using MALDI TOF/TOF. Fifty-one protein spots were successfully identified. Of these, 29 proteins were upregulated, while 22 were downregulated in the high-grain fed vs. control animals. Differential expressions of proteins including alpha enolase, elongation factor 2, calreticulin, cytochrome b5, apolipoprotein A-I, catalase, was verified by mRNA analysis and/or Western blotting. Database searches combined with Gene Ontology (GO) analysis and KEGG pathway analysis revealed that the high-grain diet resulted in altered expression of proteins related to amino acids metabolism. These results suggest new candidate proteins that may contribute to a better understanding of the signaling pathways and mechanisms that mediate liver adaptation to high-grain diet.

  20. The roots of bioinformatics in theoretical biology.

    PubMed

    Hogeweg, Paulien

    2011-03-01

    From the late 1980s onward, the term "bioinformatics" mostly has been used to refer to computational methods for comparative analysis of genome data. However, the term was originally more widely defined as the study of informatic processes in biotic systems. In this essay, I will trace this early history (from a personal point of view) and I will argue that the original meaning of the term is re-emerging.

  1. A review of bioinformatic pipeline frameworks

    PubMed Central

    2017-01-01

    Abstract High-throughput bioinformatic analyses increasingly rely on pipeline frameworks to process sequence and metadata. Modern implementations of these frameworks differ on three key dimensions: using an implicit or explicit syntax, using a configuration, convention or class-based design paradigm and offering a command line or workbench interface. Here I survey and compare the design philosophies of several current pipeline frameworks. I provide practical recommendations based on analysis requirements and the user base. PMID:27013646

  2. Genome Exploitation and Bioinformatics Tools

    NASA Astrophysics Data System (ADS)

    de Jong, Anne; van Heel, Auke J.; Kuipers, Oscar P.

    Bioinformatic tools can greatly improve the efficiency of bacteriocin screening efforts by limiting the amount of strains. Different classes of bacteriocins can be detected in genomes by looking at different features. Finding small bacteriocins can be especially challenging due to low homology and because small open reading frames (ORFs) are often omitted from annotations. In this chapter, several bioinformatic tools/strategies to identify bacteriocins in genomes are discussed.

  3. Bioinformatics strategies for the analysis of lipids.

    PubMed

    Wheelock, Craig E; Goto, Susumu; Yetukuri, Laxman; D'Alexandri, Fabio Luiz; Klukas, Christian; Schreiber, Falk; Oresic, Matej

    2009-01-01

    Owing to their importance in cellular physiology and pathology as well as to recent technological advances, the study of lipids has reemerged as a major research target. However, the structural diversity of lipids presents a number of analytical and informatics challenges. The field of lipidomics is a new postgenome discipline that aims to develop comprehensive methods for lipid analysis, necessitating concomitant developments in bioinformatics. The evolving research paradigm requires that new bioinformatics approaches accommodate genomic as well as high-level perspectives, integrating genome, protein, chemical and network information. The incorporation of lipidomics information into these data structures will provide mechanistic understanding of lipid functions and interactions in the context of cellular and organismal physiology. Accordingly, it is vital that specific bioinformatics methods be developed to analyze the wealth of lipid data being acquired. Herein, we present an overview of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and application of its tools to the analysis of lipid data. We also describe a series of software tools and databases (KGML-ED, VANTED, MZmine, and LipidDB) that can be used for the processing of lipidomics data and biochemical pathway reconstruction, an important next step in the development of the lipidomics field.

  4. Reactance, Restoration, and Cognitive Structure: Comparative Statics

    ERIC Educational Resources Information Center

    Bessarabova, Elena; Fink, Edward L.; Turner, Monique

    2013-01-01

    This study (N = 143) examined the effects of freedom threat on cognitive structures, using recycling as its topic. The results of a 2(Freedom Threat: low vs. high) x 2(Postscript: restoration vs. filler) plus 1(Control) experiment indicated that, relative to the control condition, high freedom threat created a boomerang effect for the targeted…

  5. Reactance, Restoration, and Cognitive Structure: Comparative Statics

    ERIC Educational Resources Information Center

    Bessarabova, Elena; Fink, Edward L.; Turner, Monique

    2013-01-01

    This study (N = 143) examined the effects of freedom threat on cognitive structures, using recycling as its topic. The results of a 2(Freedom Threat: low vs. high) x 2(Postscript: restoration vs. filler) plus 1(Control) experiment indicated that, relative to the control condition, high freedom threat created a boomerang effect for the targeted…

  6. Uncertainty of Comparative Judgments and Multidimensional Structure

    ERIC Educational Resources Information Center

    Sjoberg, Lennart

    1975-01-01

    An analysis of preferences with respect to silhouette drawings of nude females is presented. Systematic intransitivities were discovered. The dispersions of differences (comparatal dispersons) were shown to reflect the multidimensional structure of the stimuli, a finding expected on the basis of prior work. (Author)

  7. Uncertainty of Comparative Judgments and Multidimensional Structure

    ERIC Educational Resources Information Center

    Sjoberg, Lennart

    1975-01-01

    An analysis of preferences with respect to silhouette drawings of nude females is presented. Systematic intransitivities were discovered. The dispersions of differences (comparatal dispersons) were shown to reflect the multidimensional structure of the stimuli, a finding expected on the basis of prior work. (Author)

  8. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'.

  9. Bioconductor: open software development for computational biology and bioinformatics

    PubMed Central

    Gentleman, Robert C; Carey, Vincent J; Bates, Douglas M; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean YH; Zhang, Jianhua

    2004-01-01

    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples. PMID:15461798

  10. Comparative BioInformatics and Computational Toxicology

    EPA Science Inventory

    Reflecting the numerous changes in the field since the publication of the previous edition, this third edition of Developmental Toxicology focuses on the mechanisms of developmental toxicity and incorporates current technologies for testing in the risk assessment process.

  11. Comparative BioInformatics and Computational Toxicology

    EPA Science Inventory

    Reflecting the numerous changes in the field since the publication of the previous edition, this third edition of Developmental Toxicology focuses on the mechanisms of developmental toxicity and incorporates current technologies for testing in the risk assessment process.

  12. A new family of bacterial DNA repair proteins annotated by the integration of non-homology, distant homology and structural bioinformatic methods.

    PubMed

    Mello, Luciane V; Rigden, Daniel J

    2012-11-02

    Different bioinformatics methods illuminate different aspects of protein function, from specific catalytic activities to broad functional categories. Here, a triple-pronged approach to predict function for a domain of unknown function, DUF2086, is applied. Distant homology to characterised enzymes and conservation of key residues suggest an oxygenase function. Modelling indicates that the substrate is most likely a nucleic acid. Finally, genomic context analysis linking DUF2086 to DNA repair, leads to a predicted activity of oxidative demethylation of damaged bases in DNA. The newly assigned activity is sporadically present in phyla not containing near relatives of the similarly active repair protein AlkB. Copyright © 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  13. Survey of Natural Language Processing Techniques in Bioinformatics

    PubMed Central

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers. PMID:26525745

  14. Bioinformatics Education in Pathology Training: Current Scope and Future Direction.

    PubMed

    Clay, Michael R; Fisher, Kevin E

    2017-01-01

    Training anatomic and clinical pathology residents in the principles of bioinformatics is a challenging endeavor. Most residents receive little to no formal exposure to bioinformatics during medical education, and most of the pathology training is spent interpreting histopathology slides using light microscopy or focused on laboratory regulation, management, and interpretation of discrete laboratory data. At a minimum, residents should be familiar with data structure, data pipelines, data manipulation, and data regulations within clinical laboratories. Fellowship-level training should incorporate advanced principles unique to each subspecialty. Barriers to bioinformatics education include the clinical apprenticeship training model, ill-defined educational milestones, inadequate faculty expertise, and limited exposure during medical training. Online educational resources, case-based learning, and incorporation into molecular genomics education could serve as effective educational strategies. Overall, pathology bioinformatics training can be incorporated into pathology resident curricula, provided there is motivation to incorporate, institutional support, educational resources, and adequate faculty expertise.

  15. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  16. Compare, Contrast, Comprehend: Using Compare-Contrast Text Structures with ELLs in K-3 Classrooms

    ERIC Educational Resources Information Center

    Dreher, Mariam Jean; Gray, Jennifer Letcher

    2009-01-01

    In this article, we describe how to help primary-grade English language learners use compare-contrast text structures. Specifically, we explain (a) how to teach students to identify the compare-contrast text structure, and to use this structure to support their comprehension, (b) how to use compare-contrast texts to activate and extend students'…

  17. Adapting bioinformatics curricula for big data

    PubMed Central

    Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  18. Taking Bioinformatics to Systems Medicine.

    PubMed

    van Kampen, Antoine H C; Moerland, Perry D

    2016-01-01

    Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.

  19. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  20. Bioinformatics Tools for the Discovery of New Nonribosomal Peptides.

    PubMed

    Leclère, Valérie; Weber, Tilmann; Jacques, Philippe; Pupin, Maude

    2016-01-01

    This chapter helps in the use of bioinformatics tools relevant to the discovery of new nonribosomal peptides (NRPs) produced by microorganisms. The strategy described can be applied to draft or fully assembled genome sequences. It relies on the identification of the synthetase genes and the deciphering of the domain architecture of the nonribosomal peptide synthetases (NRPSs). In the next step, candidate peptides synthesized by these NRPSs are predicted in silico, considering the specificity of incorporated monomers together with their isomery. To assess their novelty, the two-dimensional structure of the peptides can be compared with the structural patterns of all known NRPs. The presented workflow leads to an efficient and rapid screening of genomic data generated by high throughput technologies. The exploration of such sequenced genomes may lead to the discovery of new drugs (i.e., antibiotics against multi-resistant pathogens or anti-tumors).

  1. Crustacean neuropeptides: structures, functions and comparative aspects.

    PubMed

    Keller, R

    1992-05-15

    In this article, an attempt is made to review the presently known, completely identified crustacean neuropeptides with regard to structure, function and distribution. Probably the most important progress has been made in the elucidation of a novel family of large peptides from the X-organ-sinus gland system which includes crustacean hyperglycemic hormone (CHH), putative molt-inhibiting hormone (MIH) and vitellogenesis (= gonad)-inhibiting hormone (VIH). These peptides have so far only been found in crustaceans. Renewed interest in the neurohemal pericardial organs has led to the identification of a number of cardioactive/myotropic neuropeptides, some of them unique to crustaceans. Important contributions have been made by immunocytochemical mapping of peptidergic neurons in the nervous system, which has provided evidence for a multiple role of several neuropeptides as neurohormones on the one hand and as local transmitters or modulators on the other. This has been corroborated by physiological studies. The long-known chromatophore-regulating hormones, red pigment concentrating hormone (RPCH) and pigment-dispending hormone (PDH), have been placed in a broader perspective by the demonstration of an additional role as local neuromodulators. The scope of crustacean neuropeptide research has thus been broadened considerably during the last years.

  2. Bioinformatics Approach in Plant Genomic Research

    PubMed Central

    Ong, Quang; Nguyen, Phuc; Thao, Nguyen Phuong; Le, Ly

    2016-01-01

    The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are essential in current plant genome research. Many plant genome databases have been established and continued expanding recently. Meanwhile, analytical methods based on bioinformatics are also well developed in many aspects of plant genomic research including comparative genomic analysis, phylogenomics and evolutionary analysis, and genome-wide association study. However, constantly upgrading in computational infrastructures, such as high capacity data storage and high performing analysis software, is the real challenge for plant genome research. This review paper focuses on challenges and opportunities which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to enhancement of plant productivity. PMID:27499685

  3. The growing need for microservices in bioinformatics

    PubMed Central

    Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.

    2016-01-01

    Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective

  4. The growing need for microservices in bioinformatics.

    PubMed

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and

  5. Bioinformatics: perspectives for the future.

    PubMed

    Costa, Luciano da Fontoura

    2004-12-30

    I give here a very personal perspective of Bioinformatics and its future, starting by discussing the origin of the term (and area) of bioinformatics and proceeding by trying to foresee the development of related issues, including pattern recognition/data mining, the need to reintegrate biology, the potential of complex networks as a powerful and flexible framework for bioinformatics and the interplay between bio- and neuroinformatics. Human resource formation and market perspective are also addressed. Given the complexity and vastness of these issues and concepts, as well as the limited size of a scientific article and finite patience of the reader, these perspectives are surely incomplete and biased. However, it is expected that some of the questions and trends that are identified will motivate discussions during the IcoBiCoBi round table (with the same name as this article) and perhaps provide a more ample perspective among the participants of that conference and the readers of this text.

  6. Bioinformatics/biostatistics: microarray analysis.

    PubMed

    Eichler, Gabriel S

    2012-01-01

    The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).

  7. Generations of interdisciplinarity in bioinformatics

    PubMed Central

    Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L.

    2016-01-01

    Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this “borderland.” As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature. PMID:27453689

  8. Generations of interdisciplinarity in bioinformatics.

    PubMed

    Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L

    2016-04-02

    Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this "borderland." As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature.

  9. Bioinformatics in Germany: toward a national-level infrastructure.

    PubMed

    Tauch, Andreas; Al-Dilaimi, Arwa

    2017-04-18

    The German Network for Bioinformatics Infrastructure (de.NBI) is a national initiative funded by the German Federal Ministry of Education and Research (BMBF). The mission of de.NBI is (i) to provide high-quality bioinformatics services to users in basic and applied life sciences research from academia, industry and biomedicine; (ii) to offer bioinformatics training to users in Germany and Europe through a wide range of workshops and courses; and (iii) to foster the cooperation of the German bioinformatics community with international network structures such as the European life-sciences Infrastructure for biological Information (ELIXIR). The network was launched by the BMBF in March 2015 and now includes 40 service projects operated by 30 project partners that are organized in eight service centers. The de.NBI staff develops further and maintains almost 100 bioinformatics services for the human, plant and microbial research fields and provides comprehensive training courses to support users with different expertise levels in bioinformatics. In the future, de.NBI will expand its activities to the European level, as the de.NBI consortium was assigned by the BMBF to establish and run the German node of ELIXIR. © The Author 2017. Published by Oxford University Press.

  10. Promoting synergistic research and education in genomics and bioinformatics

    PubMed Central

    2008-01-01

    Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology. High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and

  11. Bioinformatics in microbial biotechnology--a mini review.

    PubMed

    Bansal, Arvind K

    2005-06-28

    The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i) genomics--sequencing and comparative study of genomes to identify gene and genome functionality, (ii) proteomics--identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii) cell visualization and simulation to study and model cell behavior, and (iv) application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1) analysis based upon the available experimental wet-lab data, (2) the use of mathematical modeling to derive new information, and (3) an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene expression analysis to derive

  12. Promoting synergistic research and education in genomics and bioinformatics.

    PubMed

    Yang, Jack Y; Yang, Mary Qu; Zhu, Mengxia Michelle; Arabnia, Hamid R; Deng, Youping

    2008-01-01

    Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology.High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and

  13. The use of antioptimization to compare alternative structural models

    NASA Technical Reports Server (NTRS)

    Gangadharan, S. N.; Nikolaidis, E.; Lee, K.; Haftka, R. T.

    1993-01-01

    Structural models are usually tested by comparing their response with that of a reference structure (an actual structure or a more refined model) to a limited number of arbitrary loads. This test is not always reliable because the loads are arbitrary. An antioptimization-based method is proposed to test structural models. This method compares a structural model with a reference model or an actual structure under the worst loading case that maximizes the error in the model. Specifically, the method identifies the loading case that maximizes the difference between the responses of two models of the same structure using optimization. This method can be used to design experiments in order to validate a structural model. It can also be applied to identify damage in a structure by determining the load that maximizes the difference in the behavior of the damaged and the intact structure. The proposed method is illustrated by applying it to a plate and an automotive structure.

  14. Clinical Bioinformatics: challenges and opportunities

    PubMed Central

    2012-01-01

    Background Network Tools and Applications in Biology (NETTAB) Workshops are a series of meetings focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics. The NETTAB 2011 workshop, held in Pavia, Italy, in October 2011 was aimed at presenting some of the most relevant methods, tools and infrastructures that are nowadays available for Clinical Bioinformatics (CBI), the research field that deals with clinical applications of bioinformatics. Methods In this editorial, the viewpoints and opinions of three world CBI leaders, who have been invited to participate in a panel discussion of the NETTAB workshop on the next challenges and future opportunities of this field, are reported. These include the development of data warehouses and ICT infrastructures for data sharing, the definition of standards for sharing phenotypic data and the implementation of novel tools to implement efficient search computing solutions. Results Some of the most important design features of a CBI-ICT infrastructure are presented, including data warehousing, modularity and flexibility, open-source development, semantic interoperability, integrated search and retrieval of -omics information. Conclusions Clinical Bioinformatics goals are ambitious. Many factors, including the availability of high-throughput "-omics" technologies and equipment, the widespread availability of clinical data warehouses and the noteworthy increase in data storage and computational power of the most recent ICT systems, justify research and efforts in this domain, which promises to be a crucial leveraging factor for biomedical research. PMID:23095472

  15. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  16. Reproducible Bioinformatics Research for Biologists

    USDA-ARS?s Scientific Manuscript database

    This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...

  17. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  18. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  19. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  20. Teaching Bioinformatics in Concert

    PubMed Central

    Goodman, Anya L.; Dekhtyar, Alex

    2014-01-01

    Can biology students without programming skills solve problems that require computational solutions? They can if they learn to cooperate effectively with computer science students. The goal of the in-concert teaching approach is to introduce biology students to computational thinking by engaging them in collaborative projects structured around the software development process. Our approach emphasizes development of interdisciplinary communication and collaboration skills for both life science and computer science students. PMID:25411792

  1. Teaching bioinformatics in concert.

    PubMed

    Goodman, Anya L; Dekhtyar, Alex

    2014-11-01

    Can biology students without programming skills solve problems that require computational solutions? They can if they learn to cooperate effectively with computer science students. The goal of the in-concert teaching approach is to introduce biology students to computational thinking by engaging them in collaborative projects structured around the software development process. Our approach emphasizes development of interdisciplinary communication and collaboration skills for both life science and computer science students.

  2. Bioinformatics and molecular modeling in glycobiology

    PubMed Central

    Schloissnig, Siegfried

    2010-01-01

    The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395

  3. Pise: software for building bioinformatics webs.

    PubMed

    Gilbert, Don

    2002-12-01

    Pise is interface construction software for bioinformatics applications that run by command-line operations. It creates common, easy-to-use interfaces to these applications for the Web, or other uses. It is adaptable to new bioinformatics tools, and offers program chaining, Unix system batch and other controls, making it an attractive method for building and using your own bioinformatics web services.

  4. A comprehensive comparison of comparative RNA structure prediction approaches

    PubMed Central

    Gardner, Paul P; Giegerich, Robert

    2004-01-01

    Background An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms. Results Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. Conclusions We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms of both sensitivity and selectivity across different lengths and homologies. Furthermore, we outline some directions for future research. PMID:15458580

  5. Ontologies for Bioinformatics

    PubMed Central

    Schuurman, Nadine; Leszczynski, Agnieszka

    2008-01-01

    The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability. PMID:19812775

  6. Bioinformatics in the information age

    SciTech Connect

    Spengler, Sylvia J.

    2000-02-01

    There is a well-known story about the blind man examining the elephant: the part of the elephant examined determines his perception of the whole beast. Perhaps bioinformatics--the shotgun marriage between biology and mathematics, computer science, and engineering--is like an elephant that occupies a large chair in the scientific living room. Given the demand for and shortage of researchers with the computer skills to handle large volumes of biological data, where exactly does the bioinformatics elephant sit? There are probably many biologists who feel that a major product of this bioinformatics elephant is large piles of waste material. If you have tried to plow through Web sites and software packages in search of a specific tool for analyzing and collating large amounts of research data, you may well feel the same way. But there has been progress with major initiatives to develop more computing power, educate biologists about computers, increase funding, and set standards. For our purposes, bioinformatics is not simply a biologically inclined rehash of information theory (1) nor is it a hodgepodge of computer science techniques for building, updating, and accessing biological data. Rather bioinformatics incorporates both of these capabilities into a broad interdisciplinary science that involves both conceptual and practical tools for the understanding, generation, processing, and propagation of biological information. As such, bioinformatics is the sine qua non of 21st-century biology. Analyzing gene expression using cDNA microarrays immobilized on slides or other solid supports (gene chips) is set to revolutionize biology and medicine and, in so doing, generate vast quantities of data that have to be accurately interpreted (Fig. 1). As discussed at a meeting a few months ago (Microarray Algorithms and Statistical Analysis: Methods and Standards; Tahoe City, California; 9-12 November 1999), experiments with cDNA arrays must be subjected to quality control

  7. Tools for comparative protein structure modeling and analysis.

    PubMed

    Eswar, Narayanan; John, Bino; Mirkovic, Nebojsa; Fiser, Andras; Ilyin, Valentin A; Pieper, Ursula; Stuart, Ashley C; Marti-Renom, Marc A; Madhusudhan, M S; Yerkovich, Bozidar; Sali, Andrej

    2003-07-01

    The following resources for comparative protein structure modeling and analysis are described (http://salilab.org): MODELLER, a program for comparative modeling by satisfaction of spatial restraints; MODWEB, a web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER; MODLOOP, a web server for automated loop modeling that relies on MODELLER; MOULDER, a CPU intensive protocol of MODWEB for building comparative models based on distant known structures; MODBASE, a comprehensive database of annotated comparative models for all sequences detectably related to a known structure; MODVIEW, a Netscape plugin for Linux that integrates viewing of multiple sequences and structures; and SNPWEB, a web server for structure-based prediction of the functional impact of a single amino acid substitution.

  8. Virtual ligand screening against comparative protein structure models.

    PubMed

    Fan, Hao; Irwin, John J; Sali, Andrej

    2012-01-01

    Virtual ligand screening uses computation to discover new ligands of a protein by screening one or more of its structural models against a database of potential ligands. Comparative protein structure modeling extends the applicability of virtual screening beyond the atomic structures determined by X-ray crystallography or NMR spectroscopy. Here, we describe an integrated modeling and docking protocol, combining comparative modeling by MODELLER and virtual ligand screening by DOCK.

  9. Comparison of Online and Onsite Bioinformatics Instruction for a Fully Online Bioinformatics Master’s Program

    PubMed Central

    Obom, Kristina M.; Cummings, Patrick J.

    2007-01-01

    The completely online Master of Science in Bioinformatics program differs from the onsite program only in the mode of content delivery. Analysis of student satisfaction indicates no statistically significant difference between most online and onsite student responses, however, online and onsite students do differ significantly in their responses to a few questions on the course evaluation queries. Analysis of student exam performance using three assessments indicates that there was no significant difference in grades earned by students in online and onsite courses. These results suggest that our model for online bioinformatics education provides students with a rigorous course of study that is comparable to onsite course instruction and possibly provides a more rigorous course load and more opportunities for participation. PMID:23653816

  10. Comparison of Online and Onsite Bioinformatics Instruction for a Fully Online Bioinformatics Master's Program.

    PubMed

    Obom, Kristina M; Cummings, Patrick J

    2007-01-01

    The completely online Master of Science in Bioinformatics program differs from the onsite program only in the mode of content delivery. Analysis of student satisfaction indicates no statistically significant difference between most online and onsite student responses, however, online and onsite students do differ significantly in their responses to a few questions on the course evaluation queries. Analysis of student exam performance using three assessments indicates that there was no significant difference in grades earned by students in online and onsite courses. These results suggest that our model for online bioinformatics education provides students with a rigorous course of study that is comparable to onsite course instruction and possibly provides a more rigorous course load and more opportunities for participation.

  11. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  12. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  13. Omics technologies, data and bioinformatics principles.

    PubMed

    Schneider, Maria V; Orchard, Sandra

    2011-01-01

    We provide an overview on the state of the art for the Omics technologies, the types of omics data and the bioinformatics resources relevant and related to Omics. We also illustrate the bioinformatics challenges of dealing with high-throughput data. This overview touches several fundamental aspects of Omics and bioinformatics: data standardisation, data sharing, storing Omics data appropriately and exploring Omics data in bioinformatics. Though the principles and concepts presented are true for the various different technological fields, we concentrate in three main Omics fields namely: genomics, transcriptomics and proteomics. Finally we address the integration of Omics data, and provide several useful links for bioinformatics and Omics.

  14. Diagnostic biases in translational bioinformatics.

    PubMed

    Han, Henry

    2015-08-01

    With the surge of translational medicine and computational omics research, complex disease diagnosis is more and more relying on massive omics data-driven molecular signature detection. However, how to detect and prevent possible diagnostic biases in translational bioinformatics remains an unsolved problem despite its importance in the coming era of personalized medicine. In this study, we comprehensively investigate the diagnostic bias problem by analyzing benchmark gene array, protein array, RNA-Seq and miRNA-Seq data under the framework of support vector machines for different model selection methods. We further categorize the diagnostic biases into different types by conducting rigorous kernel matrix analysis and provide effective machine learning methods to conquer the diagnostic biases. In this study, we comprehensively investigate the diagnostic bias problem by analyzing benchmark gene array, protein array, RNA-Seq and miRNA-Seq data under the framework of support vector machines. We have found that the diagnostic biases happen for data with different distributions and SVM with different kernels. Moreover, we identify total three types of diagnostic biases: overfitting bias, label skewness bias, and underfitting bias in SVM diagnostics, and present corresponding reasons through rigorous analysis. Compared with the overfitting and underfitting biases, the label skewness bias is more challenging to detect and conquer because it can be easily confused as a normal diagnostic case from its deceptive accuracy. To tackle this problem, we propose a derivative component analysis based support vector machines to conquer the label skewness bias by achieving the rivaling clinical diagnostic results. Our studies demonstrate that the diagnostic biases are mainly caused by the three major factors, i.e. kernel selection, signal amplification mechanism in high-throughput profiling, and training data label distribution. Moreover, the proposed DCA-SVM diagnosis provides a

  15. Protein bioinformatics applied to virology.

    PubMed

    Mohabatkar, Hassan; Keyhanfar, Mehrnaz; Behbahani, Mandana

    2012-09-01

    Scientists have united in a common search to sequence, store and analyze genes and proteins. In this regard, rapidly evolving bioinformatics methods are providing valuable information on these newly-discovered molecules. Understanding what has been done and what we can do in silico is essential in designing new experiments. The unbalanced situation between sequence-known proteins and attribute-known proteins, has called for developing computational methods or high-throughput automated tools for fast and reliably predicting or identifying various characteristics of uncharacterized proteins. Taking into consideration the role of viruses in causing diseases and their use in biotechnology, the present review describes the application of protein bioinformatics in virology. Therefore, a number of important features of viral proteins like epitope prediction, protein docking, subcellular localization, viral protease cleavage sites and computer based comparison of their aspects have been discussed. This paper also describes several tools, principally developed for viral bioinformatics. Prediction of viral protein features and learning the advances in this field can help basic understanding of the relationship between a virus and its host.

  16. Bioinformatic identification of plant peptides.

    PubMed

    Lease, Kevin A; Walker, John C

    2010-01-01

    Plant peptides play a number of important roles in defence, development and many other aspects of plant physiology. Identifying additional peptide sequences provides the starting point to investigate their function using molecular, genetic or biochemical techniques. Due to their small size, identifying peptide sequences may not succeed using the default bioinformatic approaches that work well for average-sized proteins. There are two general scenarios related to bioinformatic identification of peptides to be discussed in this paper. In the first scenario, one already has the sequence of a plant peptide and is trying to find more plant peptides with some sequence similarity to the starting peptide. To do this, the Basic Local Alignment Search Tool (BLAST) is employed, with the parameters adjusted to be more favourable for identifying potential peptide matches. A second scenario involves trying to identify plant peptides without using sequence similarity searches to known plant peptides. In this approach, features such as protein size and the presence of a cleavable amino-terminal signal peptide are used to screen annotated proteins. A variation of this method can be used to screen for unannotated peptides from genomic sequences. Bioinformatic resources related to Arabidopsis thaliana will be used to illustrate these approaches.

  17. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  18. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  19. Robust enzyme design: bioinformatic tools for improved protein stability.

    PubMed

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.

  20. DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

    EPA Science Inventory

    The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...

  1. DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

    EPA Science Inventory

    The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...

  2. Comparative modeling of InP solar cell structures

    NASA Technical Reports Server (NTRS)

    Jain, R. K.; Weinberg, I.; Flood, D. J.

    1991-01-01

    The comparative modeling of p(+)n and n(+)p indium phosphide solar cell structures is studied using a numerical program PC-1D. The optimal design study has predicted that the p(+)n structure offers improved cell efficiencies as compared to n(+)p structure, due to higher open-circuit voltage. The various cell material and process parameters to achieve the maximum cell efficiencies are reported. The effect of some of the cell parameters on InP cell I-V characteristics was studied. The available radiation resistance data on n(+)p and p(+)p InP solar cells are also critically discussed.

  3. A Statistical Test for Comparing Nonnested Covariance Structure Models.

    ERIC Educational Resources Information Center

    Levy, Roy; Hancock, Gregory R.

    While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…

  4. Fuento: functional enrichment for bioinformatics.

    PubMed

    Weichselbaum, David; Zagrovic, Bojan; Polyansky, Anton A

    2017-08-15

    The currently available functional enrichment software focuses mostly on gene expression analysis, whereby server- and graphical-user-interface-based tools with specific scope dominate the field. Here we present an efficient, user-friendly, multifunctional command-line-based functional enrichment tool (fu-en-to), tailored for the bioinformatics researcher. Source code and binaries freely available for download at github.com/DavidWeichselbaum/fuento, implemented in C ++ and supported on Linux and OS X. newant@gmail.com or bojan.zagrovic@univie.ac.at.

  5. Bioinformatics in Africa: The Rise of Ghana?

    PubMed Central

    Karikari, Thomas K.

    2015-01-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921

  6. Bioinformatic Analysis of Gene Expression for Melanoma Treatment

    PubMed Central

    Kawakami, Akinori; Fisher, David E.

    2016-01-01

    Bioinformatic analysis of genome-wide gene expression allows us to characterize cells, including melanomas. Gene expression profiles have been generated in various stages of melanomas and analyzed by researchers in unique ways. Lauss et al. compared their melanoma subtypes with those of The Cancer Genome Atlas Network and found consistency between the two studies. PMID:27884291

  7. Bioinformatics for personal genome interpretation.

    PubMed

    Capriotti, Emidio; Nehrt, Nathan L; Kann, Maricel G; Bromberg, Yana

    2012-07-01

    An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.

  8. Emerging strengths in Asia Pacific bioinformatics

    PubMed Central

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-01-01

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008

  9. Rapid Bioinformatic Identification of Thermostabilizing Mutations

    PubMed Central

    Sauer, David B.; Karpowich, Nathan K.; Song, Jin Mei; Wang, Da-Neng

    2015-01-01

    Ex vivo stability is a valuable protein characteristic but is laborious to improve experimentally. In addition to biopharmaceutical and industrial applications, stable protein is important for biochemical and structural studies. Taking advantage of the large number of available genomic sequences and growth temperature data, we present two bioinformatic methods to identify a limited set of amino acids or positions that likely underlie thermostability. Because these methods allow thousands of homologs to be examined in silico, they have the advantage of providing both speed and statistical power. Using these methods, we introduced, via mutation, amino acids from thermoadapted homologs into an exemplar mesophilic membrane protein, and demonstrated significantly increased thermostability while preserving protein activity. PMID:26445442

  10. The European Bioinformatics Institute's data resources

    PubMed Central

    Brooksbank, Catherine; Camon, Evelyn; Harris, Midori A.; Magrane, Michele; Martin, Maria Jesus; Mulder, Nicola; O'Donovan, Claire; Parkinson, Helen; Tuli, Mary Ann; Apweiler, Rolf; Birney, Ewan; Brazma, Alvis; Henrick, Kim; Lopez, Rodrigo; Stoesser, Guenter; Stoehr, Peter; Cameron, Graham

    2003-01-01

    As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk. PMID:12519944

  11. Bioinformatics by Example: From Sequence to Target

    NASA Astrophysics Data System (ADS)

    Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj

    2002-12-01

    With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.

  12. Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06).

    PubMed

    Deng, Youping; Ni, Jun; Zhang, Chaoyang

    2006-12-12

    The first symposium of computations in bioinformatics and bioscience (SCBB06) was held in Hangzhou, China on June 21-22, 2006. Twenty-six peer-reviewed papers were selected for publication in this special issue of BMC Bioinformatics. These papers cover a broad range of topics including bioinformatics theories, algorithms, applications and tool development. The main technical topics contain gene expression analysis, sequence analysis, genome analysis, phylogenetic analysis, gene function prediction, molecular interaction and system biology, genetics and population study, immune strategy, protein structure prediction and proteomics.

  13. Comparative testing of nondestructive examination techniques for concrete structures

    NASA Astrophysics Data System (ADS)

    Clayton, Dwight A.; Smith, Cyrus M.

    2014-03-01

    A multitude of concrete-based structures are typically part of a light water reactor (LWR) plant to provide foundation, support, shielding, and containment functions. Concrete has been used in the construction of nuclear power plants (NPPs) because of three primary properties, its inexpensiveness, its structural strength, and its ability to shield radiation. Examples of concrete structures important to the safety of LWR plants include containment building, spent fuel pool, and cooling towers. Comparative testing of the various NDE concrete measurement techniques requires concrete samples with known material properties, voids, internal microstructure flaws, and reinforcement locations. These samples can be artificially created under laboratory conditions where the various properties can be controlled. Other than NPPs, there are not many applications where critical concrete structures are as thick and reinforced. Therefore, there are not many industries other than the nuclear power plant or power plant industry that are interested in performing NDE on thick and reinforced concrete structures. This leads to the lack of readily available samples of thick and heavily reinforced concrete for performing NDE evaluations, research, and training. The industry that typically performs the most NDE on concrete structures is the bridge and roadway industry. While bridge and roadway structures are thinner and less reinforced, they have a good base of NDE research to support their field NDE programs to detect, identify, and repair concrete failures. This paper will summarize the initial comparative testing of two concrete samples with an emphasis on how these techniques could perform on NPP concrete structures.

  14. Bioinformatics study of the mangrove actin genes

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wasilah, M.; Sumardi

    2017-01-01

    This study describes the bioinformatics methods to analyze eight actin genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, subcellular localization, similarity, and phylogenetic. The physical and chemical properties of eight mangroves showed variation among the genes. The percentage of the secondary structure of eight mangrove actin genes followed the order of a helix > random coil > extended chain structure for BgActl, KcActl, RsActl, and A. corniculatum Act. In contrast to this observation, the remaining actin genes were random coil > extended chain structure > a helix. This study, therefore, shown the prediction of secondary structure was performed for necessary structural information. The values of chloroplast or signal peptide or mitochondrial target were too small, indicated that no chloroplast or mitochondrial transit peptide or signal peptide of secretion pathway in mangrove actin genes. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove actin genes. To clarify the relationship among the mangrove actin gene, a phylogenetic tree was constructed. Three groups of mangrove actin genes were formed, the first group contains B. gymnorrhiza BgAct and R. stylosa RsActl. The second cluster which consists of 5 actin genes the largest group, and the last branch consist of one gene, B. sexagula Act. The present study, therefore, supported the previous results that plant actin genes form distinct clusters in the tree.

  15. Bioinformatic Identification of Rare Codon Clusters (RCCs) in HBV Genome and Evaluation of RCCs in Proteins Structure of Hepatitis B Virus.

    PubMed

    Mortazavi, Mojtaba; Zarenezhad, Mohammad; Gholamzadeh, Saeid; Alavian, Seyed Moayed; Ghorbani, Mohammad; Dehghani, Reza; Malekpour, Abdorrasoul; Meshkibaf, Mohammadhasan; Fakhrzad, Ali

    2016-10-01

    Hepatitis B virus (HBV) as an infectious disease that has nine genotypes (A - I) and a 'putative' genotype J. The aim of this study was to identify the rare codon clusters (RCC) in the HBV genome and to evaluate these RCCs in the HBV proteins structure. For detection of protein family accession numbers (Pfam) in HBV proteins, the UniProt database and Pfam search tool were used. Protein family accession numbers is a comprehensive and accurate collection of protein domains and families. It contains annotation of each family in the form of textual descriptions, links to other resources and literature references. Genome projects have used Pfam extensively for large-scale functional annotation of genomic data; Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). The Pfam search tools are databases that identify Pfam of proteins. These Pfam IDs were analyzed in Sherlocc program and the location of RCCs in HBV genome and proteins were detected and reported as translated EMBL nucleotide sequence data library (TrEMBL) entries. The TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of European molecular biology laboratory (EMBL) nucleotide sequence entries not yet integrated in SWISS-PROT. Furthermore, the structures of TrEMBL entries proteins were studied in the PDB database and 3D structures of the HBV proteins and locations of RCCs were visualized and studied using Swiss PDB Viewer software®. The Pfam search tool found nine protein families in three frames. Results of Pfams studies in the Sherlocc program showed that this program has not identified RCCs in the external core antigen (PF08290) and truncated HBeAg gene (PF08290) of HBV. By contrast, the RCCs were identified in gene of hepatitis core antigen (PF00906 and the residues 224 - 234 and 251 - 255), large envelope protein S (PF00695 and the residues 53-56 and 70 - 84), X protein (PF00739 and

  16. Bioinformatic Identification of Rare Codon Clusters (RCCs) in HBV Genome and Evaluation of RCCs in Proteins Structure of Hepatitis B Virus

    PubMed Central

    Mortazavi, Mojtaba; Zarenezhad, Mohammad; Gholamzadeh, Saeid; Alavian, Seyed Moayed; Ghorbani, Mohammad; Dehghani, Reza; Malekpour, Abdorrasoul; Meshkibaf, Mohammadhasan; Fakhrzad, Ali

    2016-01-01

    Background Hepatitis B virus (HBV) as an infectious disease that has nine genotypes (A - I) and a ‘putative’ genotype J. Objectives The aim of this study was to identify the rare codon clusters (RCC) in the HBV genome and to evaluate these RCCs in the HBV proteins structure. Methods For detection of protein family accession numbers (Pfam) in HBV proteins, the UniProt database and Pfam search tool were used. Protein family accession numbers is a comprehensive and accurate collection of protein domains and families. It contains annotation of each family in the form of textual descriptions, links to other resources and literature references. Genome projects have used Pfam extensively for large-scale functional annotation of genomic data; Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). The Pfam search tools are databases that identify Pfam of proteins. These Pfam IDs were analyzed in Sherlocc program and the location of RCCs in HBV genome and proteins were detected and reported as translated EMBL nucleotide sequence data library (TrEMBL) entries. The TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of European molecular biology laboratory (EMBL) nucleotide sequence entries not yet integrated in SWISS-PROT. Furthermore, the structures of TrEMBL entries proteins were studied in the PDB database and 3D structures of the HBV proteins and locations of RCCs were visualized and studied using Swiss PDB Viewer software®. Results The Pfam search tool found nine protein families in three frames. Results of Pfams studies in the Sherlocc program showed that this program has not identified RCCs in the external core antigen (PF08290) and truncated HBeAg gene (PF08290) of HBV. By contrast, the RCCs were identified in gene of hepatitis core antigen (PF00906 and the residues 224 - 234 and 251 - 255), large envelope protein S (PF00695 and the residues

  17. Structural and bioinformatic analysis of the kiwifruit allergen Act d 11, a member of the family of ripening-related proteins.

    PubMed

    Chruszcz, Maksymilian; Ciardiello, Maria Antonietta; Osinski, Tomasz; Majorek, Karolina A; Giangrieco, Ivana; Font, Jose; Breiteneder, Heimo; Thalassinos, Konstantinos; Minor, Wladek

    2013-12-01

    The allergen Act d 11, also known as kirola, is a 17 kDa protein expressed in large amounts in ripe green and yellow-fleshed kiwifruit. Ten percent of all kiwifruit-allergic individuals produce IgE specific for the protein. Using X-ray crystallography, we determined the first three-dimensional structures of Act d 11, produced from both recombinant expression in Escherichia coli and from the natural source (kiwifruit). While Act d 11 is immunologically correlated with the birch pollen allergen Bet v 1 and other members of the pathogenesis-related protein family 10 (PR-10), it has low sequence similarity to PR-10 proteins. By sequence Act d 11 appears instead to belong to the major latex/ripening-related (MLP/RRP) family, but analysis of the crystal structures shows that Act d 11 has a fold very similar to that of Bet v 1 and other PR-10 related allergens regardless of the low sequence identity. The structures of both the natural and recombinant protein include an unidentified ligand, which is relatively small (about 250 Da by mass spectrometry experiments) and most likely contains an aromatic ring. The ligand-binding cavity in Act d 11 is also significantly smaller than those in PR-10 proteins. The binding of the ligand, which we were not able to unambiguously identify, results in conformational changes in the protein that may have physiological and immunological implications. Interestingly, the residue corresponding to Glu45 in Bet v 1 (Glu46), which is important for IgE binding to the birch pollen allergen, is conserved in Act d 11, even though it is not in other allergens with significantly higher sequence identity to Bet v 1. We suggest that the so-called Gly-rich loop (or P-loop), which is conserved in all PR-10 allergens, may be responsible for IgE cross-reactivity between Bet v 1 and Act d 11. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. MODBASE, a database of annotated comparative protein structure models.

    PubMed

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C; Ilyin, Valentin A; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10(-4)) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.

  19. Genomics and Bioinformatics Resources for Crop Improvement

    PubMed Central

    Mochida, Keiichi; Shinozaki, Kazuo

    2010-01-01

    Recent remarkable innovations in platforms for omics-based research and application development provide crucial resources to promote research in model and applied plant species. A combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant productivity. Furthermore, promotion of comparative genomics among model and applied plants allows us to grasp the biological properties of each species and to accelerate gene discovery and functional analyses of genes. Bioinformatics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomic resources, including resource integration. We review recent advances in research platforms and resources in plant omics together with related databases and advances in technology. PMID:20208064

  20. Combining multiple decisions: applications to bioinformatics

    NASA Astrophysics Data System (ADS)

    Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

    2008-01-01

    Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.

  1. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  2. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data

  3. Structures of School Systems Worldwide: A Comparative Study

    ERIC Educational Resources Information Center

    Popov, Nikolay

    2012-01-01

    In the past 20 years I have been examining the structures of school systems worldwide. This ongoing research has been enriched by the findings obtained from the lecture course on Comparative Education I have been delivering to students in the Bachelor and Master's Education Programs at Sofia University, Bulgaria. This paper presents some results…

  4. Non-structural carbohydrates in woody plants compared among laboratories

    Treesearch

    Audrey G. Quentin; Elizabeth A. Pinkard; Michael G. Ryan; David T. Tissue; L. Scott Baggett; Henry D. Adams; Pascale Maillard; Jacqueline Marchand; Simon M. Landhausser; Andre Lacointe; Yves Gibon; William R. L. Anderegg; Shinichi Asao; Owen K. Atkin; Marc Bonhomme; Caroline Claye; Pak S. Chow; Anne Clement-Vidal; Noel W. Davies; L. Turin Dickman; Rita Dumbur; David S. Ellsworth; Kristen Falk; Lucía Galiano; Jose M. Grunzweig; Henrik Hartmann; Gunter Hoch; Sharon Hood; Joanna E. Jones; Takayoshi Koike; Iris Kuhlmann; Francisco Lloret; Melchor Maestro; Shawn D. Mansfield; Jordi Martinez-Vilalta; Mickael Maucourt; Nathan G. McDowell; Annick Moing; Bertrand Muller; Sergio G. Nebauer; Ulo Niinemets; Sara Palacio; Frida Piper; Eran Raveh; Andreas Richter; Gaelle Rolland; Teresa Rosas; Brigitte Saint Joanis; Anna Sala; Renee A. Smith; Frank Sterck; Joseph R. Stinziano; Mari Tobias; Faride Unda; Makoto Watanabe; Danielle A. Way; Lasantha K. Weerasinghe; Birgit Wild; Erin Wiley; David R. Woodruff

    2016-01-01

    Non-structural carbohydrates (NSC) in plant tissue are frequently quantified to make inferences about plant responses to environmental conditions. Laboratories publishing estimates of NSC of woody plants use many different methods to evaluate NSC. We asked whether NSC estimates in the recent literature could be quantitatively compared among studies. We also...

  5. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  6. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  7. Fuzzy Logic in Medicine and Bioinformatics

    PubMed Central

    Torres, Angela; Nieto, Juan J.

    2006-01-01

    The purpose of this paper is to present a general view of the current applications of fuzzy logic in medicine and bioinformatics. We particularly review the medical literature using fuzzy logic. We then recall the geometrical interpretation of fuzzy sets as points in a fuzzy hypercube and present two concrete illustrations in medicine (drug addictions) and in bioinformatics (comparison of genomes). PMID:16883057

  8. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  9. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…

  10. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  11. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  12. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  13. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  14. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…

  15. Comparing High-latitude Ionospheric and Thermospheric Lagrangian Coherent Structures

    NASA Astrophysics Data System (ADS)

    Wang, N.; Ramirez, U.; Flores, F.; Okic, D.; Datta-Barua, S.

    2015-12-01

    Lagrangian Coherent Structures (LCSs) are invisible boundaries in time varying flow fields that may be subject to mixing and turbulence. The LCS is defined by the local maxima of the finite time Lyapunov exponent (FTLE), a scalar field quantifying the degree of stretching of fluid elements over the flow domain. Although the thermosphere is dominated by neutral wind processes and the ionosphere is governed by plasma electrodynamics, we can compare the LCS in the two modeled flow fields to yield insight into transport and interaction processes in the high-latitude IT system. For obtaining thermospheric LCS, we use the Horizontal Wind Model 2014 (HWM14) [1] at a single altitude to generate the two-dimensional velocity field. The FTLE computation is applied to study the flow field of the neutral wind, and to visualize the forward-time Lagrangian Coherent Structures in the flow domain. The time-varying structures indicate a possible thermospheric LCS ridge in the auroral oval area. The results of a two-day run during a geomagnetically quiet period show that the structures are diurnally quasi-periodic, thus that solar radiation influences the neutral wind flow field. To find the LCS in the high-latitude ionospheric drifts, the Weimer 2001 [2] polar electric potential model and the International Geomagnetic Reference Field 11 [3] are used to compute the ExB drift flow field in ionosphere. As with the neutral winds, the Lagrangian Coherent Structures are obtained by applying the FTLE computation. The relationship between the thermospheric and ionospheric LCS is analyzed by comparing overlapping FTLE maps. Both a publicly available FTLE solver [4] and a custom-built FTLE computation are used and compared for validation [5]. Comparing the modeled IT LCSs on a quiet day with the modeled IT LCSs on a storm day indicates important factors on the structure and time evolution of the LCS.

  16. Mathematics and evolutionary biology make bioinformatics education comprehensible.

    PubMed

    Jungck, John R; Weisstein, Anton E

    2013-09-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.

  17. Mathematics and evolutionary biology make bioinformatics education comprehensible

    PubMed Central

    Weisstein, Anton E.

    2013-01-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621

  18. Technical phosphoproteomic and bioinformatic tools useful in cancer research

    PubMed Central

    2011-01-01

    Reversible protein phosphorylation is one of the most important forms of cellular regulation. Thus, phosphoproteomic analysis of protein phosphorylation in cells is a powerful tool to evaluate cell functional status. The importance of protein kinase-regulated signal transduction pathways in human cancer has led to the development of drugs that inhibit protein kinases at the apex or intermediary levels of these pathways. Phosphoproteomic analysis of these signalling pathways will provide important insights for operation and connectivity of these pathways to facilitate identification of the best targets for cancer therapies. Enrichment of phosphorylated proteins or peptides from tissue or bodily fluid samples is required. The application of technologies such as phosphoenrichments, mass spectrometry (MS) coupled to bioinformatics tools is crucial for the identification and quantification of protein phosphorylation sites for advancing in such relevant clinical research. A combination of different phosphopeptide enrichments, quantitative techniques and bioinformatic tools is necessary to achieve good phospho-regulation data and good structural analysis of protein studies. The current and most useful proteomics and bioinformatics techniques will be explained with research examples. Our aim in this article is to be helpful for cancer research via detailing proteomics and bioinformatic tools. PMID:21967744

  19. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  20. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  1. The 2016 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.

  2. Comparative immunogenicity and structural analysis of epitopes of different bacterial L-asparaginases.

    PubMed

    Pokrovsky, Vadim S; Kazanov, Marat D; Dyakov, Ilya N; Pokrovskaya, Marina V; Aleksandrova, Svetlana S

    2016-02-11

    E.coli type II L-asparaginase is widely used for treatment of acute lymphoblastic leukemia. However, serious side effects such as allergic or hypersensitivity reactions are common for L-asparaginase treatment. Methods for minimizing immune response on L-asparaginase treatment in human include bioengeneering of less immunogenic version of the enzyme or utilizing the homologous enzymes of different origin. To rationalize these approaches we compared immunogenicity of L-asparaginases from five bacterial organisms and performed sequence-structure analysis of the presumable epitope regions. IgG and IgM immune response in C57B16 mice after immunization with Wollinella succinogenes type II (WsA), Yersinia pseudotuberculosis type II (YpA), Erwinia carotovora type II (EwA), and Rhodospirillum rubrum type I (RrA) and Escherichia coli type II (EcA) L-asparaginases was evaluated using standard ELISA method. The comparative bioinformatics analysis of structure and sequence of the bacterial L-asparaginases presumable epitope regions was performed. We showed different immunogenic properties of five studied L-asparaginases and confirmed the possibility of replacement of EcA with L-asparaginase from different origin as a second-line treatment. Studied L-asparaginases might be placed in the following order based on the immunogenicity level: YpA > RrA, WsA ≥ EwA > EcA. Most significant cross-immunogenicity was shown between EcA and YpA. We propose that a long N-terminus of YpA enzyme enriched with charged aminoacids and tryptophan could be a reason of higher immunogenicity of YpA in comparison with other considered enzymes. Although the recognized structural and sequence differences in putative epitope regions among five considered L-asparaginases does not fully explain experimental observation of the immunogenicity of the enzymes, the performed analysis set the foundation for further research in this direction. The performed studies showed different immunogenic

  3. Accuracy of functional surfaces on comparatively modeled protein structures

    PubMed Central

    Zhao, Jieling; Dundas, Joe; Kachalo, Sema; Ouyang, Zheng; Liang, Jie

    2012-01-01

    Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using Modeller, we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the tempalte protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured. PMID:21541664

  4. Comparing Structural Perspectives on Medical Informatics: EMBASE vs. MEDLINE

    PubMed Central

    Morris, Theodore Allan

    2003-01-01

    Previous bibliometric analyses of Medical Informatics’ internal structure used MEDLINE records as the unit of study. EMBASE, a product of Excerpta Medica, carries a wider international scope and offers complementary retrieval results to MEDLINE. Since much medical informatics critical thinking originated abroad and migrated to North America, this difference in coverage may also indicate a different perspective of “what constitutes medical informatics.” Using traditional bibliometric and multivariate data analysis techniques, the present work examines EMBASE indexing records for the same 1995–1999 time frame as earlier MEDLINE studies to identify and compare structural features of the field.. PMID:14728448

  5. Nucleic acid structure analysis: Local, mathematically rigorous, comparable

    SciTech Connect

    Babcock, M.S.

    1993-01-01

    A more sophisticated mathematical treatments for analyzing nucleic acid coordinate data is presented. The methodology is both rigorous and comparable for parameterizing nucleic acids in terms of the local structural morphology of complementary and neighboring base pairs. Chapter 1 clearly defines the problems of nucleic acid structure parameterization by examining the consequences of the EMBO workshop guidelines published in 1989. Chapter 2 defines mathematics to rigorously and comparably calculate all of the parameters for nucleic acid structure from a local viewpoint. The mathematics satisfies all EMBO guidelines for local structural parameters. One of the main features making this program flexible is that any base pair relationship can be rigorously analyzed. This is because the meaning of zero for the complementary base parameters is clearly definable for any base pairing relationship. Chapter 3 analyses and explains why certain pairwise parameter correlations were observed between rotational and translational parameters. It was observed that the method of calculating the rotational parameters greatly affected the calculated translational parameters. As a result of our analysis, we determined the optimum location about which rotations should be performed in order to reduce and/or eliminate the correlations which are artifacts of the mathematics employed and do not reflect true structural properties of nucleic acids. Chapter 4 presents an analysis of the available nucleic acid X-ray crystallographic structural data, showing that the experimental base pairs do not generally have the ideal Watson-Crick structure. By utilizing a hybrid between helical and Cartesian parameterization methods, the relative distribution of the complementary base parameters was examined as a function of the nearest neighboring base pairs. The final chapter includes a review article explaining each of the available methods in plain English as well as giving the mathematics.

  6. [Comparative analysis of spatial organization of myoglobins. II. Secondary structure].

    PubMed

    Korobov, V N; Nazarenko, V I; Radomskiĭ, N F; Starodub, N F

    1992-01-01

    An analysis of probability of distribution curves of alpha-helical sites and bends of polypeptide chains of myoglobins in half-water mammals (beaver, nutria, muskrat, otter) carried out in comparison with those of myoglobins of the horse and Sperm whale (X-ray diffraction analysis has revealed their tertiary structure) has revealed a coincidence of the secondary structure sites end bends of the chain in the studied respiratory hemoproteins of muscles. Despite a considerable number of amino acid substitutions the profiles of alpha-helicity and B-bends of the compared proteins are practically identical. This indicates to the "resistance" of the probability curves to amino acid substitutions and to retention of the tertiary structure of myoglobins in evolutionary remote species of the animals.

  7. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  8. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  9. When cloud computing meets bioinformatics: a review.

    PubMed

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  10. Translational Bioinformatics and Clinical Research (Biomedical) Informatics.

    PubMed

    Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T

    2016-03-01

    Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.

  11. Comparing molecules and solids across structural and alchemical space.

    PubMed

    De, Sandip; Bartók, Albert P; Csányi, Gábor; Ceriotti, Michele

    2016-05-18

    Evaluating the (dis)similarity of crystalline, disordered and molecular compounds is a critical step in the development of algorithms to navigate automatically the configuration space of complex materials. For instance, a structural similarity metric is crucial for classifying structures, searching chemical space for better compounds and materials, and driving the next generation of machine-learning techniques for predicting the stability and properties of molecules and materials. In the last few years several strategies have been designed to compare atomic coordination environments. In particular, the smooth overlap of atomic positions (SOAPs) has emerged as an elegant framework to obtain translation, rotation and permutation-invariant descriptors of groups of atoms, underlying the development of various classes of machine-learned inter-atomic potentials. Here we discuss how one can combine such local descriptors using a regularized entropy match (REMatch) approach to describe the similarity of both whole molecular and bulk periodic structures, introducing powerful metrics that enable the navigation of alchemical and structural complexities within a unified framework. Furthermore, using this kernel and a ridge regression method we can predict atomization energies for a database of small organic molecules with a mean absolute error below 1 kcal mol(-1), reaching an important milestone in the application of machine-learning techniques for the evaluation of molecular properties.

  12. Comparative study of medium damped and detuned linear accelerator structures

    SciTech Connect

    Jean-Francois Ostiguy et al.

    2001-08-22

    Long range wakefields are a serious concern for a future linear collider based on room temperature accelerating structures. They can be suppressed either by detuning and or local damping or with some combination of both strategies. Detuning relies on precisely phasing the contributions of the dipole modes excited by the passage of a single bunch. This is accomplished by controlling individual mode frequencies, a process which dictates individual cell dimensional tolerances. Each mode must be excited with the correct strength; this in turn, determines cell-to-cell alignment tolerances. In contrast, in a locally damped structure, the modes are attenuated at the cell level. Clearly, mode frequencies and relative excitation become less critical in that context; mechanical fabrication tolerances can be relaxed. While local damping is ideal from the stand-point of long range wakefield suppression, this comes at the cost of reducing the shunt impedance and possibly unacceptable localized heating. Recently, the Medium Damped Structure (MDS), a compromise between detuning and local damping, has generated some interest. In this paper, we compare a hypothetical MDS to the NLC Rounded Damped Detuned Structure (RDDS) and investigate possible advantages from the standpoint fabrication tolerances and their relation to beam stability and emittance preservation.

  13. A comparative structural study of wet and dried ettringite

    SciTech Connect

    Renaudin, G.; Filinchuk, Y.; Neubauer, J.; Goetz-Neunhoeffer, F.

    2010-03-15

    Two different techniques were used to compare structural characteristics of 'wet' ettringite (stored in the synthesis mother liquid) and 'dried' ettringite (dried to 35% relative humidity over saturated CaCl{sub 2} solution). Lattice parameters and the water content in the channel region of the structure (site occupancy factor of the water molecule not bonded to cations) as well as microstructure parameters (size and strain) were determined from a Rietveld refinement on synchrotron powder diffraction data. Local environment of sulphate anions and of the hydrogen bonding network was characterized by Raman spectroscopy. Both techniques led to the same conclusion: the 'wet' ettringite sample immersed in the mother solution from the synthesis presents similar structural features as ettringite dried to 35% relative humidity. An increase of the a lattice parameter combined with a decrease of the c lattice parameter occurs on drying. The amount of structural water, the point symmetry of sulphate and the hydrogen bond network are unchanged when passing from the wet to the dried ettringite powder. Ettringite does not form a high-hydrate polymorph in equilibrium with alkaline solution, in contrast to the AFm phases that lose water molecules on drying. According to these results we conclude that ettringite precipitated in aqueous solution at the early hydration stages is of the same chemical composition as ettringite present in the hardening concrete.

  14. Comparative population structure of cavity-nesting sea ducks

    USGS Publications Warehouse

    Pearce, John M.; Eadie, John M.; Savard, Jean-Pierre L.; Christensen, Thomas K.; Berdeen, James; Taylor, Eric J.; Boyd, Sean; Einarsson, Árni

    2014-01-01

    A growing collection of mtDNA genetic information from waterfowl species across North America suggests that larger-bodied cavity-nesting species exhibit greater levels of population differentiation than smaller-bodied congeners. Although little is known about nest-cavity availability for these species, one hypothesis to explain differences in population structure is reduced dispersal tendency of larger-bodied cavity-nesting species due to limited abundance of large cavities. To investigate this hypothesis, we examined population structure of three cavity-nesting waterfowl species distributed across much of North America: Barrow's Goldeneye (Bucephala islandica), Common Goldeneye (B. clangula), and Bufflehead (B. albeola). We compared patterns of population structure using both variation in mtDNA control-region sequences and band-recovery data for the same species and geographic regions. Results were highly congruent between data types, showing structured population patterns for Barrow's and Common Goldeneye but not for Bufflehead. Consistent with our prediction, the smallest cavity-nesting species, the Bufflehead, exhibited the lowest level of population differentiation due to increased dispersal and gene flow. Results provide evidence for discrete Old and New World populations of Common Goldeneye and for differentiation of regional groups of both goldeneye species in Alaska, the Pacific Northwest, and the eastern coast of North America. Results presented here will aid management objectives that require an understanding of population delineation and migratory connectivity between breeding and wintering areas. Comparative studies such as this one highlight factors that may drive patterns of genetic diversity and population trends.

  15. Bioinformatics and the Undergraduate Curriculum Essay

    PubMed Central

    Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of bioinformatics as a new discipline has challenged many colleges and universities to keep current with their curricula, often in the face of static or dwindling resources. On the plus side, many bioinformatics modules and related databases and software programs are free and accessible online, and interdisciplinary partnerships between existing faculty members and their support staff have proved advantageous in such efforts. We present examples of strategies and methods that have been successfully used to incorporate bioinformatics content into undergraduate curricula. PMID:20810947

  16. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  17. PATRIC, the bacterial bioinformatics database and analysis resource

    PubMed Central

    Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323

  18. Bioinformatics in Italy: BITS2011, the Eighth Annual Meeting of the Italian Society of Bioinformatics

    PubMed Central

    2012-01-01

    The BITS2011 meeting, held in Pisa on June 20-22, 2011, brought together more than 120 Italian researchers working in the field of Bioinformatics, as well as students in Bioinformatics, Computational Biology, Biology, Computer Sciences, and Engineering, representing a landscape of Italian bioinformatics research. This preface provides a brief overview of the meeting and introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. PMID:22536954

  19. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  20. Bioinformatics for analysis of poxvirus genomes.

    PubMed

    Da Silva, Melissa; Upton, Chris

    2012-01-01

    In recent years, there have been numerous unprecedented technological advances in the field of molecular biology; these include DNA sequencing, mass spectrometry of proteins, and microarray analysis of mRNA transcripts. Perhaps, however, it is the area of genomics, which has now generated the complete genome sequences of more than 100 poxviruses, that has had the greatest impact on the average virology researcher because the DNA sequence data is in constant use in many different ways by almost all molecular virologists. As this data resource grows, so does the importance of the availability of databases and software tools to enable the bench virologist to work with and make use of this (valuable/expensive) DNA sequence information. Thus, providing researchers with intuitive software to first select and reformat genomics data from large databases, second, to compare/analyze genomics data, and third, to view and interpret large and complex sets of results has become pivotal in enabling progress to be made in modern virology. This chapter is directed at the bench virologist and describes the software required for a number of common bioinformatics techniques that are useful for comparing and analyzing poxvirus genomes. In a number of examples, we also highlight the Viral Orthologous Clusters database system and integrated tools that we developed for the management and analysis of complete viral genomes.

  1. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    SciTech Connect

    Sundstrom, J.; Tash, B; Murakami, T; Flanagan, J; Bewley, M; Stanley, B; Gonsar, K; Antonetti, D

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  2. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu.

  3. Use or abuse of bioinformatic tools: a response to Samach.

    PubMed

    Muñoz-Fambuena, Natalia; Mesejo, Carlos; González-Mas, María C; Primo-Millo, Eduardo; Agustí, Manuel; Iglesias, Domingo J

    2013-03-01

    In a recent paper, we described for the first time the effects of fruit on the expression of putative homologues of genes involved in flowering pathways. It was our aim to provide insight into the molecular mechanisms underlying alternate bearing in citrus. However, a bioinformatics-based critique of our and other related papers has been given by Samach in the preceding Viewpoint article in this issue of Annals of Botany. The use of certain bioinformatic tools in a context of structural rather than functional genomics can cast doubts about the veracity of a large amount of data published in recent years. In this response, the contentions raised by Samach are analysed, and rebuttals of his criticisms are presented.

  4. Data Mining for Grammatical Inference with Bioinformatics Criteria

    NASA Astrophysics Data System (ADS)

    López, Vivian F.; Aguilar, Ramiro; Alonso, Luis; Moreno, María N.; Corchado, Juan M.

    In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simplified production rules, and compacted with bioinformatics criteria to make up a grammar.

  5. Legal issues for chem-bioinformatics models.

    PubMed

    Duardo-Sanchez, Aliuska; Gonzalez-Diaz, Humberto

    2013-01-01

    Chem-Bioinformatic models connect the chemical structure of drugs and/or targets (protein, gen, RNA, microorganism, tissue, disease...) with drug biological activity over this target. On the other hand, a systematic judicial framework is needed to provide appropriate and relevant guidance for addressing various computing techniques as applied to scientific research in biosciences frontiers. This article reviews both: the use of the predictions made with models for regulatory purposes and how to protect (in legal terms) the models of molecular systems per se, and the software used to seek them. First we review: i) models as a tool for regulatory purposes, ii) Organizations Involved with Validation of models, iii) Regulatory Guidelines and Documents for models, iv) Models for Human Health and Environmental Endpoint, and v) Difficulties to Validation of models, and other issues. Next, we focused on the legal protection of models and software; including: a short summary of topics, and methods for legal protection of computer software. We close the review with a section that treats the taxes in software use.

  6. Bioinformatics of the TULIP domain superfamily.

    PubMed

    Kopec, Klaus O; Alva, Vikram; Lupas, Andrei N

    2011-08-01

    Proteins of the BPI (bactericidal/permeability-increasing protein)-like family contain either one or two tandem copies of a fold that usually provides a tubular cavity for the binding of lipids. Bioinformatic analyses show that, in addition to its known members, which include BPI, LBP [LPS (lipopolysaccharide)-binding protein)], CETP (cholesteryl ester-transfer protein), PLTP (phospholipid-transfer protein) and PLUNC (palate, lung and nasal epithelium clone) protein, this family also includes other, more divergent groups containing hypothetical proteins from fungi, nematodes and deep-branching unicellular eukaryotes. More distantly, BPI-like proteins are related to a family of arthropod proteins that includes hormone-binding proteins (Takeout-like; previously described to adopt a BPI-like fold), allergens and several groups of uncharacterized proteins. At even greater evolutionary distance, BPI-like proteins are homologous with the SMP (synaptotagmin-like, mitochondrial and lipid-binding protein) domains, which are found in proteins associated with eukaryotic membrane processes. In particular, SMP domain-containing proteins of yeast form the ERMES [ER (endoplasmic reticulum)-mitochondria encounter structure], required for efficient phospholipid exchange between these organelles. This suggests that SMP domains themselves bind lipids and mediate their exchange between heterologous membranes. The most distant group of homologues we detected consists of uncharacterized animal proteins annotated as TM (transmembrane) 24. We propose to group these families together into one superfamily that we term as the TULIP (tubular lipid-binding) domain superfamily.

  7. Bioinformatic Approaches to Metabolic Pathways Analysis

    PubMed Central

    Maudsley, Stuart; Chadwick, Wayne; Wang, Liyun; Zhou, Yu; Martin, Bronwen; Park, Sung-Soo

    2015-01-01

    The growth and development in the last decade of accurate and reliable mass data collection techniques has greatly enhanced our comprehension of cell signaling networks and pathways. At the same time however, these technological advances have also increased the difficulty of satisfactorily analyzing and interpreting these ever-expanding datasets. At the present time, multiple diverse scientific communities including molecular biological, genetic, proteomic, bioinformatic, and cell biological, are converging upon a common endpoint, that is, the measurement, interpretation, and potential prediction of signal transduction cascade activity from mass datasets. Our ever increasing appreciation of the complexity of cellular or receptor signaling output and the structural coordination of intracellular signaling cascades has to some extent necessitated the generation of a new branch of informatics that more closely associates functional signaling effects to biological actions and even whole-animal phenotypes. The ability to untangle and hopefully generate theoretical models of signal transduction information flow from transmembrane receptor systems to physiological and pharmacological actions may be one of the greatest advances in cell signaling science. In this overview, we shall attempt to assist the navigation into this new field of cell signaling and highlight several methodologies and technologies to appreciate this exciting new age of signal transduction. PMID:21870222

  8. Bioinformatic approaches to metabolic pathways analysis.

    PubMed

    Maudsley, Stuart; Chadwick, Wayne; Wang, Liyun; Zhou, Yu; Martin, Bronwen; Park, Sung-Soo

    2011-01-01

    The growth and development in the last decade of accurate and reliable mass data collection techniques has greatly enhanced our comprehension of cell signaling networks and pathways. At the same time however, these technological advances have also increased the difficulty of satisfactorily analyzing and interpreting these ever-expanding datasets. At the present time, multiple diverse scientific communities including molecular biological, genetic, proteomic, bioinformatic, and cell biological, are converging upon a common endpoint, that is, the measurement, interpretation, and potential prediction of signal transduction cascade activity from mass datasets. Our ever increasing appreciation of the complexity of cellular or receptor signaling output and the structural coordination of intracellular signaling cascades has to some extent necessitated the generation of a new branch of informatics that more closely associates functional signaling effects to biological actions and even whole-animal phenotypes. The ability to untangle and hopefully generate theoretical models of signal transduction information flow from transmembrane receptor systems to physiological and pharmacological actions may be one of the greatest advances in cell signaling science. In this overview, we shall attempt to assist the navigation into this new field of cell signaling and highlight several methodologies and technologies to appreciate this exciting new age of signal transduction.

  9. Comparative Structure of Saturn's Rings from Cassini Radio Occultation Observations

    NASA Astrophysics Data System (ADS)

    Marouf, Essam A.; French, R. G.; Rappaport, N. J.; McGhee, C. A.; Wong, K.; Thomson, F. S.; Anabtawi, A.

    2007-10-01

    Radio occultations of Saturn's rings during the Cassini prime mission fall into three main groups, depending on the rings opening angle B. The first is a set of eight diametric occultations completed early in the mission (March-September/2005) when |B| was relatively large (19.5 to 23.5°). They permitted multiple-longitude profiling of relatively optically thick ring features, revealing detailed structure of enigmatic Ring B. The second is to be completed late in the mission when the rings are relatively closed (|B| < 10°). They will provide enhanced sensitivity to tenuous ring material, hence complementary information about small optical depth structure. Bridging the two groups is a third composed of two specially designed occultations recently completed (May-June/2007). They capture the intermediate range |B| 15°. Because the rings were still reasonably open, much of the structure was profiled. The different occultation geometry from the diametric group provided enhanced sensitivity to bending waves and other inclined features. We comparatively consider variability (or lack of) of observed ring structure with B and longitude. The variability when present can be true (dynamically forced features) or apparent (azimuthal asymmetry due to preferentially aligned gravitational wakes). The multiple-longitude coverage provides rich characterization of the true variability, including remarkable variations in the morphology of gap-embedded ringlets in Ring C, clear variations in the width of gaps in the Cassini Division, wavelike features in Ring C (the "Rosen Waves"), classical satellite wake profiles due to Pan, in addition to many density and few bending waves. For the apparent asymmetry, observed optical depth variations with B, viewing geometry, and wavelength constrain physical properties of the rings microstructure (particle sizes, particle-cluster sizes and orientation, spatial cluster density, vertical ring profile and physical thickness, ...). Complementary

  10. Comparing and distinguishing the structure of biological branching.

    PubMed

    Lamberton, Timothy O; Lefevre, James; Short, Kieran M; Smyth, Ian M; Hamilton, Nicholas A

    2015-01-21

    Bifurcating developmental branching morphogenesis gives rise to complex organs such as the lung and the ureteric tree of the kidney. However, a few quantitative methods or tools exist to compare and distinguish, at a structural level, the critical features of these important biological systems. Here we develop novel graph alignment techniques to quantify the structural differences of rooted bifurcating trees and demonstrate their application in the analysis of developing kidneys from in normal and mutant mice. We have developed two graph based metrics: graph discordance, which measures how well the graphs representing the branching structures of distinct trees graphs can be aligned or overlayed; and graph inclusion, which measures the degree of containment of a tree graph within another. To demonstrate the application of these approaches we first benchmark the discordance metric on a data set of 32 normal and 28Tgfβ(+/-) mutant mouse ureteric trees. We find that the discordance metric better distinguishes control and mutant mouse kidneys than alternative metrics based on graph size and fingerprints - the distribution of tip depths. Using this metric we then show that the structure of the mutant trees follows the same pattern as the normal kidneys, but undergo a major delay in elaboration at later stages. Analysis of both controls and mutants using the inclusion metric gives strong support to the hypothesis that ureteric tree growth is stereotypic. Additionally, we present a new generalised multi-tree alignment algorithm that minimises the sum of pairwise graph discordance and which can be used to generate maximum consensus trees that represent the archetype for fixed developmental stages. These tools represent an advance in the analysis and quantification of branching patterns and will be invaluable in gaining a deeper understanding of the mechanisms that drive development. All code is being made available with documentation and example data with this publication.

  11. Protein structure prediction provides comparable performance to crystallographic structures in docking-based virtual screening.

    PubMed

    Du, Hongying; Brender, Jeffrey R; Zhang, Jian; Zhang, Yang

    2015-01-01

    Structure based virtual screening has largely been limited to protein targets for which either an experimental structure is available or a strongly homologous template exists so that a high-resolution model can be constructed. The performance of state of the art protein structure predictions in virtual screening in systems where only weakly homologous templates are available is largely untested. Using the challenging DUD database of structural decoys, we show here that even using templates with only weak sequence homology (<30% sequence identity) structural models can be constructed by I-TASSER which achieve comparable enrichment rates to using the experimental bound crystal structure in the majority of the cases studied. For 65% of the targets, the I-TASSER models, which are constructed essentially in the apo conformations, reached 70% of the virtual screening performance of using the holo-crystal structures. A correlation was observed between the success of I-TASSER in modeling the global fold and local structures in the binding pockets of the proteins versus the relative success in virtual screening. The virtual screening performance can be further improved by the recognition of chemical features of the ligand compounds. These results suggest that the combination of structure-based docking and advanced protein structure modeling methods should be a valuable approach to the large-scale drug screening and discovery studies, especially for the proteins lacking crystallographic structures.

  12. Regulatory bioinformatics for food and drug safety.

    PubMed

    Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William

    2016-10-01

    "Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements

  13. Bioinformatics: Cheap and robust method to explore biomaterial from Indonesia biodiversity

    NASA Astrophysics Data System (ADS)

    Widodo

    2015-02-01

    Indonesia has a huge amount of biodiversity, which may contain many biomaterials for pharmaceutical application. These resources potency should be explored to discover new drugs for human wealth. However, the bioactive screening using conventional methods is very expensive and time-consuming. Therefore, we developed a methodology for screening the potential of natural resources based on bioinformatics. The method is developed based on the fact that organisms in the same taxon will have similar genes, metabolism and secondary metabolites product. Then we employ bioinformatics to explore the potency of biomaterial from Indonesia biodiversity by comparing species with the well-known taxon containing the active compound through published paper or chemical database. Then we analyze drug-likeness, bioactivity and the target proteins of the active compound based on their molecular structure. The target protein was examined their interaction with other proteins in the cell to determine action mechanism of the active compounds in the cellular level, as well as to predict its side effects and toxicity. By using this method, we succeeded to screen anti-cancer, immunomodulators and anti-inflammation from Indonesia biodiversity. For example, we found anticancer from marine invertebrate by employing the method. The anti-cancer was explore based on the isolated compounds of marine invertebrate from published article and database, and then identified the protein target, followed by molecular pathway analysis. The data suggested that the active compound of the invertebrate able to kill cancer cell. Further, we collect and extract the active compound from the invertebrate, and then examined the activity on cancer cell (MCF7). The MTT result showed that the methanol extract of marine invertebrate was highly potent in killing MCF7 cells. Therefore, we concluded that bioinformatics is cheap and robust way to explore bioactive from Indonesia biodiversity for source of drug and another

  14. Comparative analysis of titanium oxide nanotubes ordered structure formation

    NASA Astrophysics Data System (ADS)

    Shchegoleva, S. A.; Titov, P. L.; Kondrikov, N. B.

    2017-09-01

    The comparative analysis of formation of highly-ordered nanotubes array of titanium oxide obtained by the method of anodizing in non-aqueous (No. 1) and semi-aqueous (No. 2) electrolytes is considered in this paper. Analysis of nanotubes formation current of both samples has shown that the fine structure of the current is observed: a series of small steps alternated by sharp swells. Attractor obtained from current implementation of sample No. 1 points at the existence of quasistochastic in terms of phase and strictly periodic mode of partial Levy flights. Attractor of sample No. 2 is more smoothed. Current realizations of both samples points at trigger mode of system behavior peculiar to auto-oscillating process.

  15. Data structure of search & compare (S&C) reservation protocol

    NASA Astrophysics Data System (ADS)

    Markovič, Miroslav; Dubovan, Jozef; Dado, Milan; Benedikovič, Daniel; Litvík, Ján.

    2012-01-01

    On the present time, the most used technology of core networks is Wavelength-division multiplexing (WDM) which save a lot of bandwidth of optical fiber. But in each node all optical signals must be converted into the electrical domain, processed and converted back into the optical domain. Result of all these steps is that the data spend in the node a lot of time. This time decreases total available bandwidth in the optical networks. One of the results is that we compose WDM nodes which represent hybrid system of switching and controlling. If we use out-of-band signalizing it is simpler to separate control head from the data. For effective control and transmission of data over the optical networks, the reservation protocols are needed in WDM/OBS4,5. In today's networks exist a lot of the protocols, which have their own advantages and disadvantages. For our investigation it was chosen the reservation protocol called Search & Compare (S &C)1, because it uses parallel-segment based and parallel link reservation. The structure of data will be designed from the point of view of wavelength for transmission channels, length of optical burst, source and group addresses in the segment, number of nodes and the total time needed for switching. Structure of the protocol will contain all of the control messages which are necessary for reservation a path along all segments. The design of the protocol follows the ITU-T recommendation2,3.

  16. Structure, function and evolution of the gas exchangers: comparative perspectives.

    PubMed

    Maina, J N

    2002-10-01

    Over the evolutionary continuum, animals have faced similar fundamental challenges of acquiring molecular oxygen for aerobic metabolism. Under limitations and constraints imposed by factors such as phylogeny, behaviour, body size and environment, they have responded differently in founding optimal respiratory structures. A quintessence of the aphorism that 'necessity is the mother of invention', gas exchangers have been inaugurated through stiff cost-benefit analyses that have evoked transaction of trade-offs and compromises. Cogent structural-functional correlations occur in constructions of gas exchangers: within and between taxa, morphological complexity and respiratory efficiency increase with metabolic capacities and oxygen needs. Highly active, small endotherms have relatively better-refined gas exchangers compared with large, inactive ectotherms. Respiratory structures have developed from the plain cell membrane of the primeval prokaryotic unicells to complex multifunctional ones of the modern Metazoa. Regarding the respiratory medium used to extract oxygen from, animal life has had only two choices--water or air--within the biological range of temperature and pressure the only naturally occurring respirable fluids. In rarer cases, certain animals have adapted to using both media. Gills (evaginated gas exchangers) are the primordial respiratory organs: they are the archetypal water breathing organs. Lungs (invaginated gas exchangers) are the model air breathing organs. Bimodal (transitional) breathers occupy the water-air interface. Presentation and exposure of external (water/air) and internal (haemolymph/blood) respiratory media, features determined by geometric arrangement of the conduits, are important features for gas exchange efficiency: counter-current, cross-current, uniform pool and infinite pool designs have variably developed.

  17. Bioinformatics process management: information flow via a computational journal

    PubMed Central

    Feagan, Lance; Rohrer, Justin; Garrett, Alexander; Amthauer, Heather; Komp, Ed; Johnson, David; Hock, Adam; Clark, Terry; Lushington, Gerald; Minden, Gary; Frost, Victor

    2007-01-01

    This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread–ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples. PMID:18053179

  18. HotSwap for bioinformatics: a STRAP tutorial.

    PubMed

    Gille, Christoph; Robinson, Peter N

    2006-02-09

    Bioinformatics applications are now routinely used to analyze large amounts of data. Application development often requires many cycles of optimization, compiling, and testing. Repeatedly loading large datasets can significantly slow down the development process. We have incorporated HotSwap functionality into the protein workbench STRAP, allowing developers to create plugins using the Java HotSwap technique. Users can load multiple protein sequences or structures into the main STRAP user interface, and simultaneously develop plugins using an editor of their choice such as Emacs. Saving changes to the Java file causes STRAP to recompile the plugin and automatically update its user interface without requiring recompilation of STRAP or reloading of protein data. This article presents a tutorial on how to develop HotSwap plugins. STRAP is available at http://strapjava.de and http://www.charite.de/bioinf/strap. HotSwap is a useful and time-saving technique for bioinformatics developers. HotSwap can be used to efficiently develop bioinformatics applications that require loading large amounts of data into memory.

  19. HotSwap for bioinformatics: A STRAP tutorial

    PubMed Central

    Gille, Christoph; Robinson, Peter N

    2006-01-01

    Background Bioinformatics applications are now routinely used to analyze large amounts of data. Application development often requires many cycles of optimization, compiling, and testing. Repeatedly loading large datasets can significantly slow down the development process. We have incorporated HotSwap functionality into the protein workbench STRAP, allowing developers to create plugins using the Java HotSwap technique. Results Users can load multiple protein sequences or structures into the main STRAP user interface, and simultaneously develop plugins using an editor of their choice such as Emacs. Saving changes to the Java file causes STRAP to recompile the plugin and automatically update its user interface without requiring recompilation of STRAP or reloading of protein data. This article presents a tutorial on how to develop HotSwap plugins. STRAP is available at and . Conclusion HotSwap is a useful and time-saving technique for bioinformatics developers. HotSwap can be used to efficiently develop bioinformatics applications that require loading large amounts of data into memory. PMID:16469097

  20. Comparing structural fingerprints using a literature-based similarity benchmark.

    PubMed

    O'Boyle, Noel M; Sayle, Roger A

    2016-01-01

    The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint

  1. Development of Bioinformatics Infrastructure for Genomics Research.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Adebiyi, Marion; Adeyemi, Seun; Ahmed, Azza; Ahmed, Rehab; Akanle, Bola; Alibi, Mohamed; Armstrong, Don L; Aron, Shaun; Ashano, Efejiro; Baichoo, Shakuntala; Benkahla, Alia; Brown, David K; Chimusa, Emile R; Fadlelmola, Faisal M; Falola, Dare; Fatumo, Segun; Ghedira, Kais; Ghouila, Amel; Hazelhurst, Scott; Isewon, Itunuoluwa; Jung, Segun; Kassim, Samar Kamal; Kayondo, Jonathan K; Mbiyavanga, Mamana; Meintjes, Ayton; Mohammed, Somia; Mosaku, Abayomi; Moussa, Ahmed; Muhammd, Mustafa; Mungloo-Dilmohamud, Zahra; Nashiru, Oyekanmi; Odia, Trust; Okafor, Adaobi; Oladipo, Olaleye; Osamor, Victor; Oyelade, Jellili; Sadki, Khalid; Salifu, Samson Pandam; Soyemi, Jumoke; Panji, Sumir; Radouani, Fouzia; Souiai, Oussama; Tastan Bishop, Özlem

    2017-06-01

    Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for

  2. Non-structural carbohydrates in woody plants compared among laboratories.

    PubMed

    Quentin, Audrey G; Pinkard, Elizabeth A; Ryan, Michael G; Tissue, David T; Baggett, L Scott; Adams, Henry D; Maillard, Pascale; Marchand, Jacqueline; Landhäusser, Simon M; Lacointe, André; Gibon, Yves; Anderegg, William R L; Asao, Shinichi; Atkin, Owen K; Bonhomme, Marc; Claye, Caroline; Chow, Pak S; Clément-Vidal, Anne; Davies, Noel W; Dickman, L Turin; Dumbur, Rita; Ellsworth, David S; Falk, Kristen; Galiano, Lucía; Grünzweig, José M; Hartmann, Henrik; Hoch, Günter; Hood, Sharon; Jones, Joanna E; Koike, Takayoshi; Kuhlmann, Iris; Lloret, Francisco; Maestro, Melchor; Mansfield, Shawn D; Martínez-Vilalta, Jordi; Maucourt, Mickael; McDowell, Nathan G; Moing, Annick; Muller, Bertrand; Nebauer, Sergio G; Niinemets, Ülo; Palacio, Sara; Piper, Frida; Raveh, Eran; Richter, Andreas; Rolland, Gaëlle; Rosas, Teresa; Saint Joanis, Brigitte; Sala, Anna; Smith, Renee A; Sterck, Frank; Stinziano, Joseph R; Tobias, Mari; Unda, Faride; Watanabe, Makoto; Way, Danielle A; Weerasinghe, Lasantha K; Wild, Birgit; Wiley, Erin; Woodruff, David R

    2015-11-01

    Non-structural carbohydrates (NSC) in plant tissue are frequently quantified to make inferences about plant responses to environmental conditions. Laboratories publishing estimates of NSC of woody plants use many different methods to evaluate NSC. We asked whether NSC estimates in the recent literature could be quantitatively compared among studies. We also asked whether any differences among laboratories were related to the extraction and quantification methods used to determine starch and sugar concentrations. These questions were addressed by sending sub-samples collected from five woody plant tissues, which varied in NSC content and chemical composition, to 29 laboratories. Each laboratory analyzed the samples with their laboratory-specific protocols, based on recent publications, to determine concentrations of soluble sugars, starch and their sum, total NSC. Laboratory estimates differed substantially for all samples. For example, estimates for Eucalyptus globulus leaves (EGL) varied from 23 to 116 (mean = 56) mg g(-1) for soluble sugars, 6-533 (mean = 94) mg g(-1) for starch and 53-649 (mean = 153) mg g(-1) for total NSC. Mixed model analysis of variance showed that much of the variability among laboratories was unrelated to the categories we used for extraction and quantification methods (method category R(2) = 0.05-0.12 for soluble sugars, 0.10-0.33 for starch and 0.01-0.09 for total NSC). For EGL, the difference between the highest and lowest least squares means for categories in the mixed model analysis was 33 mg g(-1) for total NSC, compared with the range of laboratory estimates of 596 mg g(-1). Laboratories were reasonably consistent in their ranks of estimates among tissues for starch (r = 0.41-0.91), but less so for total NSC (r = 0.45-0.84) and soluble sugars (r = 0.11-0.83). Our results show that NSC estimates for woody plant tissues cannot be compared among laboratories. The relative changes in NSC between treatments measured within a laboratory

  3. Reverse Translational Bioinformatics: A Bioinformatics Assay Of Age, Gender And Clinical Biomarkers

    PubMed Central

    Fliss, Amit; Ragolsky, Micha; Rubin, Eitan

    2008-01-01

    In bioinformatics, clinical data is rarely used. Here, we propose using bedsidedata in basic research, via bioinformatics methodologies. To demonstrate the potential of this so called Reverse Translational Bioinformatics approach, classical bioinformatics tools were applied to blood biomarker information attained from a large scale, open-access cross sectional survey. The results of this analysis include a novel classification of blood biomarkers, critical ages in which basic biological processes may shift in humans, and a possible approach to exploring the gender specificity of these shifts. Changes in normal values were also shown to be non-linear, with most of the non-linearity attributed to the shift from growth to maturity. Together, these finding demonstrate that reversed translational bioinformatics may contribute to basic research. PMID:21347121

  4. Carving a niche: establishing bioinformatics collaborations

    PubMed Central

    Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.

    2006-01-01

    Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668

  5. [Post-translational modification (PTM) bioinformatics in China: progresses and perspectives].

    PubMed

    Zexian, Liu; Yudong, Cai; Xuejiang, Guo; Ao, Li; Tingting, Li; Jianding, Qiu; Jian, Ren; Shaoping, Shi; Jiangning, Song; Minghui, Wang; Lu, Xie; Yu, Xue; Ziding, Zhang; Xingming, Zhao

    2015-07-01

    Post-translational modifications (PTMs) are essential for regulating conformational changes, activities and functions of proteins, and are involved in almost all cellular pathways and processes. Identification of protein PTMs is the basis for understanding cellular and molecular mechanisms. In contrast with labor-intensive and time-consuming experiments, the PTM prediction using various bioinformatics approaches can provide accurate, convenient, and efficient strategies and generate valuable information for further experimental consideration. In this review, we summarize the current progresses made by Chineses bioinformaticians in the field of PTM Bioinformatics, including the design and improvement of computational algorithms for predicting PTM substrates and sites, design and maintenance of online and offline tools, establishment of PTM-related databases and resources, and bioinformatics analysis of PTM proteomics data. Through comparing similar studies in China and other countries, we demonstrate both advantages and limitations of current PTM bioinformatics as well as perspectives for future studies in China.

  6. Study on the Response Coefficient of Setback Structures Compared to Regular Moment Frame Structures

    SciTech Connect

    Mirghaderi, S. Rasoul; Khafaf, Bardia; Epackachi, Siamak

    2008-07-08

    In design practice of many countries, seismic analysis and proportioning of structures are usually based upon linear elastic analysis due to reduced seismic forces by response coefficient; R. Setback structures are one of the most popular shapes of the constructed buildings. In setback structures, the shape and proportions of the building have a major effect on distribution of earthquake forces as they work their way through the building. On the other hand, geometric configuration has a profound effect on the structural-dynamic response of a building. Therefore, when a building has irregular features, such as asymmetric in height or vertical discontinuity, the traditional assumptions used in development of seismic criteria for regular buildings may not be applicable. Inelastic seismic behavior of these types of structures seems to be quite different from the regular steel moment resisting structures in which the overall ductility is localized at beam-ends.In order to investigate the seismic behavior and estimate the Response Coefficient of those structures, nonlinear static analysis (pushover) are used for three categories of setback structures namely low rise, medium rise and high rise buildings with different setbacks in their height. The Response Coefficient are calculated and compared with those taken from regular type of moment frame structures.

  7. A comparison of common programming languages used in bioinformatics.

    PubMed

    Fourment, Mathieu; Gillings, Michael R

    2008-02-05

    The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

  8. Structure, function and evolution of the gas exchangers: comparative perspectives

    PubMed Central

    Maina, JN

    2002-01-01

    Over the evolutionary continuum, animals have faced similar fundamental challenges of acquiring molecular oxygen for aerobic metabolism. Under limitations and constraints imposed by factors such as phylogeny, behaviour, body size and environment, they have responded differently in founding optimal respiratory structures. A quintessence of the aphorism that ‘necessity is the mother of invention’, gas exchangers have been inaugurated through stiff cost–benefit analyses that have evoked transaction of trade-offs and compromises. Cogent structural–functional correlations occur in constructions of gas exchangers: within and between taxa, morphological complexity and respiratory efficiency increase with metabolic capacities and oxygen needs. Highly active, small endotherms have relatively better-refined gas exchangers compared with large, inactive ectotherms. Respiratory structures have developed from the plain cell membrane of the primeval prokaryotic unicells to complex multifunctional ones ofthe modern Metazoa. Regarding the respiratory medium used to extract oxygen from, animal life has had only two choices – water or air – within the biological range of temperature and pressure the only naturally occurring respirable fluids. In rarer cases, certain animalshave adapted to using both media. Gills (evaginated gas exchangers) are the primordial respiratory organs: they are the archetypal water breathing organs. Lungs (invaginated gas exchangers) are the model air breathing organs. Bimodal (transitional) breathers occupy the water–air interface. Presentation and exposure of external (water/air) and internal (haemolymph/blood) respiratory media, features determined by geometric arrangement of the conduits, are important features for gas exchange efficiency: counter-current, cross-current, uniform pool and infinite pool designs have variably developed. PMID:12430953

  9. Automated programming for bioinformatics algorithm deployment.

    PubMed

    Alterovitz, Gil; Jiwaji, Adnaan; Ramoni, Marco F

    2008-02-01

    Many bioinformatics solutions suffer from the lack of usable interface/platform from which results can be analyzed and visualized. Overcoming this hurdle would allow for more widespread dissemination of bioinformatics algorithms within the biological and medical communities. The algorithms should be accessible without extensive technical support or programming knowledge. Here, we propose a dynamic wizard platform that provides users with a Graphical User Interface (GUI) for most Java bioinformatics library toolkits. The application interface is generated in real-time based on the original source code. This platform lets developers focus on designing algorithms and biologists/physicians on testing hypotheses and analyzing results. The open source code can be downloaded from: http://bcl.med.harvard.edu/proteomics/proj/APBA/.

  10. Translational Bioinformatics and Clinical Research (Biomedical) Informatics.

    PubMed

    Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T

    2015-06-01

    Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Penalized feature selection and classification in bioinformatics

    PubMed Central

    Huang, Jian

    2008-01-01

    In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data. PMID:18562478

  12. BioZoom: Exploiting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources

    SciTech Connect

    Liu, L; Buttler, D; Critchlow, T J; Han, W; Paques, H; Pu, C; Rocco, D

    2003-01-09

    Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source-capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.

  13. BioZone Exploting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources

    SciTech Connect

    Liu, L; Buttler, D; Paques, H; Pu, C; Critchlow

    2002-01-28

    Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source-capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.

  14. Bioinformatic Analysis of GJB2 Gene Missense Mutations.

    PubMed

    Yilmaz, Akin

    2015-04-01

    Gap junction beta 2 (GJB2) gene is the most commonly mutated connexin gene in patients with autosomal recessive and dominant hearing loss. According to Ensembl (release 74) database, 1347 sequence variations are reported in the GJB2 gene and about 13.5% of them are categorized as missense SNPs or nonsynonymous variant. Because of the high incidence of GJB2 mutations in hearing loss patients, revealing the molecular effect of GJB2 mutations on protein structure may also provide clear point of view regarding the molecular etiology of deafness. Hence, the aim of this study is to analyze structural and functional consequences of all known GJB2 missense variations to the Cx26 protein by applying multiple bioinformatics methods. Two-hundred and eleven nonsynonymous variants were collected from Ensembl release 74, Leiden Open Variation Database (LOVD) and The Human Gene Mutation Database (HGMD). A number of bioinformatic tools were utilized for predicting the effect of GJB2 missense mutations at the sequence, structural, and functional levels. Some of the mutations were found to locate highly conserved regions and have structural and functional properties. Moreover, GJB2 mutations were also found to affect Cx26 protein at the molecular level via loss or gain of disorder, catalytic site, and post-translational modifications, including methylation, glycosylation, and ubiquitination. Findings, presented here, demonstrated the application of bioinformatic algorithms to predict the effects of mutations causing hearing impairment. I expect, this type of analysis will serve as a start point for future experimental evaluation of the GJB2 gene mutations and it will also be helpful in evaluating other deafness-related gene mutations.

  15. Water and ammonia on Cu{110}: comparative structure and bonding.

    PubMed

    Jones, Glenn; Jenkins, Stephen J

    2013-04-07

    Water and ammonia are arguably the two most important inorganic molecular species in the modern world, and their interaction with metal surfaces is key to unlocking their further potential in a number of spheres. In this comparative study, conducted on the Cu{110} substrate, we present results from first-principles density functional theory that highlight the similarities and differences between these chemical cousins. We find that ammonia is less likely than water to undergo thermally induced partial dissociation, although we nevertheless identify the most likely product of electron-stimulated or defect-induced dissociation to be a surface amino species. We predict that ammonia, like water, will adopt a bilayer structure at high coverage, but that unlike water the net intermolecular interaction will be repulsive, despite the formation of a weak hydrogen-bonded network. Furthermore, we suggest that coadsorption of water and ammonia can give rise to an intimately mixed overlayer in which ammonia molecules are bound directly to the surface whilst water molecules are attached only via hydrogen bonds from below.

  16. BioShaDock: a community driven bioinformatics shared Docker-based tools registry

    PubMed Central

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191

  17. BioShaDock: a community driven bioinformatics shared Docker-based tools registry.

    PubMed

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.

  18. The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics

    PubMed Central

    Greenhill, Simon J.; Blust, Robert; Gray, Russell D.

    2008-01-01

    Phylogenetic methods have revolutionised evolutionary biology and have recently been applied to studies of linguistic and cultural evolution. However, the basic comparative data on the languages of the world required for these analyses is often widely dispersed in hard to obtain sources. Here we outline how our Austronesian Basic Vocabulary Database (ABVD) helps remedy this situation by collating wordlists from over 500 languages into one web-accessible database. We describe the technology underlying the ABVD and discuss the benefits that an evolutionary bioinformatic approach can provide. These include facilitating computational comparative linguistic research, answering questions about human prehistory, enabling syntheses with genetic data, and safe-guarding fragile linguistic information. PMID:19204825

  19. A novel approach to represent and compare RNA secondary structures

    PubMed Central

    Mattei, Eugenio; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2014-01-01

    Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny. PMID:24753415

  20. Rabifier2: an improved bioinformatic classifier of Rab GTPases.

    PubMed

    Surkont, Jaroslaw; Diekmann, Yoan; Pereira-Leal, José B

    2016-10-22

    The Rab family of small GTPases regulates and provides specificity to the endomembrane trafficking system; each Rab subfamily is associated with specific pathways. Thus, characterization of Rab repertoires provides functional information about organisms and evolution of the eukaryotic cell. Yet, the complex structure of the Rab family limits the application of existing methods for protein classification. Here, we present a major redesign of the Rabifier, a bioinformatic pipeline for detection and classification of Rab GTPases. It is more accurate, significantly faster than the original version and is now open source, both the code and the data, allowing for community participation.

  1. Bioinformatics: A History of Evolution "In Silico"

    ERIC Educational Resources Information Center

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  2. Bioinformatics: A History of Evolution "In Silico"

    ERIC Educational Resources Information Center

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  3. Bioboxes: standardised containers for interchangeable bioinformatics software.

    PubMed

    Belmann, Peter; Dröge, Johannes; Bremges, Andreas; McHardy, Alice C; Sczyrba, Alexander; Barton, Michael D

    2015-01-01

    Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable.

  4. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  5. Medical informatics and bioinformatics: a bibliometric study

    PubMed Central

    Bansard, Jean-Yves; Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Beltrame, Francesco; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Tollis, Ioannis; Van der Lei, Johan; Coatrieux, Jean-Louis

    2007-01-01

    This paper reports on an analysis of the bioinformatics and medical informatics literature with the objective to identify upcoming trends that are shared among both research fields to derive benefits from potential collaborative initiatives for their future. Our results present the main characteristics of the two fields and show that these domains are still relatively separated. PMID:17521073

  6. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    ERIC Educational Resources Information Center

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  7. "Extreme Programming" in a Bioinformatics Class

    ERIC Educational Resources Information Center

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP).…

  8. 2010 Translational bioinformatics year in review

    PubMed Central

    Miller, Katharine S

    2011-01-01

    A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future. PMID:21672905

  9. Bioinformatics in Undergraduate Education: Practical Examples

    ERIC Educational Resources Information Center

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  10. Implementing bioinformatic workflows within the bioextract server

    USDA-ARS?s Scientific Manuscript database

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  11. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    ERIC Educational Resources Information Center

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  12. Bioinformatics in Undergraduate Education: Practical Examples

    ERIC Educational Resources Information Center

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  13. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    EPA Science Inventory

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  14. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    EPA Science Inventory

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  15. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  16. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles

  17. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  18. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  19. Navigating the changing learning landscape: perspective from bioinformatics.ca.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2013-09-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs.

  20. Structural realism versus deployment realism: A comparative evaluation.

    PubMed

    Lyons, Timothy D

    2016-10-01

    In this paper I challenge and adjudicate between the two positions that have come to prominence in the scientific realism debate: deployment realism and structural realism. I discuss a set of cases from the history of celestial mechanics, including some of the most important successes in the history of science. To the surprise of the deployment realist, these are novel predictive successes toward which theoretical constituents that are now seen to be patently false were genuinely deployed. Exploring the implications for structural realism, I show that the need to accommodate these cases forces our notion of "structure" toward a dramatic depletion of logical content, threatening to render it explanatorily vacuous: the better structuralism fares against these historical examples, in terms of retention, the worse it fares in content and explanatory strength. I conclude by considering recent restrictions that serve to make "structure" more specific. I show however that these refinements will not suffice: the better structuralism fares in specificity and explanatory strength, the worse it fares against history. In light of these case studies, both deployment realism and structural realism are significantly threatened by the very historical challenge they were introduced to answer.

  1. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  2. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  3. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Xiang, Fang; Ningqiu, Li; Xiaozhe, Fu; Kaibin, Li; Qiang, Lin; Lihui, Liu; Cunbin, Shi; Shuqin, Wu

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  4. Comparing two tetraalkylammonium ionic liquids. I. Liquid phase structure

    NASA Astrophysics Data System (ADS)

    Lima, Thamires A.; Paschoal, Vitor H.; Faria, Luiz F. O.; Ribeiro, Mauro C. C.; Giles, Carlos

    2016-06-01

    X-ray scattering experiments at room temperature were performed for the ionic liquids n-butyl-trimethylammonium bis(trifluoromethanesulfonyl)imide, [N1114][NTf2], and methyl-tributylammonium bis(trifluoromethanesulfonyl)imide, [N1444][NTf2]. The peak in the diffraction data characteristic of charge ordering in [N1444][NTf2] is shifted to longer distances in comparison to [N1114][NTf2], but the peak characteristic of short-range correlations is shifted in [N1444][NTf2] to shorter distances. Molecular dynamics (MD) simulations were performed for these ionic liquids using force fields available from the literature, although with new sets of partial charges for [N1114]+ and [N1444]+ proposed in this work. The shifting of charge and adjacency peaks to opposite directions in these ionic liquids was found in the static structure factor, S(k), calculated by MD simulations. Despite differences in cation sizes, the MD simulations unravel that anions are allowed as close to [N1444]+ as to [N1114]+ because anions are located in between the angle formed by the butyl chains. The more asymmetric molecular structure of the [N1114]+ cation implies differences in partial structure factors calculated for atoms belonging to polar or non-polar parts of [N1114][NTf2], whereas polar and non-polar structure factors are essentially the same in [N1444][NTf2]. Results of this work shed light on controversies in the literature on the liquid structure of tetraalkylammonium based ionic liquids.

  5. Comparing two tetraalkylammonium ionic liquids. I. Liquid phase structure.

    PubMed

    Lima, Thamires A; Paschoal, Vitor H; Faria, Luiz F O; Ribeiro, Mauro C C; Giles, Carlos

    2016-06-14

    X-ray scattering experiments at room temperature were performed for the ionic liquids n-butyl-trimethylammonium bis(trifluoromethanesulfonyl)imide, [N1114][NTf2], and methyl-tributylammonium bis(trifluoromethanesulfonyl)imide, [N1444][NTf2]. The peak in the diffraction data characteristic of charge ordering in [N1444][NTf2] is shifted to longer distances in comparison to [N1114][NTf2], but the peak characteristic of short-range correlations is shifted in [N1444][NTf2] to shorter distances. Molecular dynamics (MD) simulations were performed for these ionic liquids using force fields available from the literature, although with new sets of partial charges for [N1114](+) and [N1444](+) proposed in this work. The shifting of charge and adjacency peaks to opposite directions in these ionic liquids was found in the static structure factor, S(k), calculated by MD simulations. Despite differences in cation sizes, the MD simulations unravel that anions are allowed as close to [N1444](+) as to [N1114](+) because anions are located in between the angle formed by the butyl chains. The more asymmetric molecular structure of the [N1114](+) cation implies differences in partial structure factors calculated for atoms belonging to polar or non-polar parts of [N1114][NTf2], whereas polar and non-polar structure factors are essentially the same in [N1444][NTf2]. Results of this work shed light on controversies in the literature on the liquid structure of tetraalkylammonium based ionic liquids.

  6. How do disordered regions achieve comparable functions to structured domains?

    PubMed

    Latysheva, Natasha S; Flock, Tilman; Weatheritt, Robert J; Chavali, Sreenivas; Babu, M Madan

    2015-06-01

    The traditional structure to function paradigm conceives of a protein's function as emerging from its structure. In recent years, it has been established that unstructured, intrinsically disordered regions (IDRs) in proteins are equally crucial elements for protein function, regulation and homeostasis. In this review, we provide a brief overview of how IDRs can perform similar functions to structured proteins, focusing especially on the formation of protein complexes and assemblies and the mediation of regulated conformational changes. In addition to highlighting instances of such functional equivalence, we explain how differences in the biological and physicochemical properties of IDRs allow them to expand the functional and regulatory repertoire of proteins. We also discuss studies that provide insights into how mutations within functional regions of IDRs can lead to human diseases.

  7. Determining protein similarity by comparing hydrophobic core structure.

    PubMed

    Gadzała, M; Kalinowska, B; Banach, M; Konieczny, L; Roterman, I

    2017-02-01

    Formal assessment of structural similarity is - next to protein structure prediction - arguably the most important unsolved problem in proteomics. In this paper we propose a similarity criterion based on commonalities between the proteins' hydrophobic cores. The hydrophobic core emerges as a result of conformational changes through which each residue reaches its intended position in the protein body. A quantitative criterion based on this phenomenon has been proposed in the framework of the CASP challenge. The structure of the hydrophobic core - including the placement and scope of any deviations from the idealized model - may indirectly point to areas of importance from the point of view of the protein's biological function. Our analysis focuses on an arbitrarily selected target from the CASP11 challenge. The proposed measure, while compliant with CASP criteria (70-80% correlation), involves certain adjustments which acknowledge the presence of factors other than simple spatial arrangement of solids.

  8. Bioinformatics and systems biology research update from the 15(th) International Conference on Bioinformatics (InCoB2016).

    PubMed

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  9. Comparative Analysis on Time Series with Included Structural Break

    NASA Astrophysics Data System (ADS)

    Andreeski, Cvetko J.; Vasant, Pandian

    2009-08-01

    The time series analysis (ARIMA models) is a good approach for identification of time series. But, if we have structural break in the time series, we cannot create only one model of time series. Further more, if we don't have enough data between two structural breaks, it's impossible to create valid time series models for identification of the time series. This paper explores the possibility of identification of the inflation process dynamics via of the system-theoretic, by means of both Box-Jenkins ARIMA methodologies and artificial neural networks.

  10. Comparative static curing versus dynamic curing on tablet coating structures.

    PubMed

    Gendre, Claire; Genty, Muriel; Fayard, Barbara; Tfayli, Ali; Boiret, Mathieu; Lecoq, Olivier; Baron, Michel; Chaminade, Pierre; Péan, Jean Manuel

    2013-09-10

    Curing is generally required to stabilize film coating from aqueous polymer dispersion. This post-coating drying step is traditionally carried out in static conditions, requiring the transfer of solid dosage forms to an oven. But, curing operation performed directly inside the coating equipment stands for an attractive industrial application. Recently, the use of various advanced physico-chemical characterization techniques i.e., X-ray micro-computed tomography, vibrational spectroscopies (near infrared and Raman) and X-ray microdiffraction, allowed new insights into the film-coating structures of dynamically cured tablets. Dynamic curing end-point was efficiently determined after 4h. The aim of the present work was to elucidate the influence of curing conditions on film-coating structures. Results demonstrated that 24h of static curing and 4h of dynamic curing, both performed at 60°C and ambient relative humidity, led to similar coating layers in terms of drug release properties, porosity, water content, structural rearrangement of polymer chains and crystalline distribution. Furthermore, X-ray microdiffraction measurements pointed out different crystalline coating compositions depending on sample storage time. An aging mechanism might have occur during storage, resulting in the crystallization and the upward migration of cetyl alcohol, coupled to the downward migration of crystalline sodium lauryl sulfate within the coating layer. Interestingly, this new study clearly provided further knowledge into film-coating structures after a curing step and confirmed that curing operation could be performed in dynamic conditions.

  11. Scholastic Performance and the Structuring of Ambition: A Comparative Study.

    ERIC Educational Resources Information Center

    Schwarzweller, Harry K.

    This conceptual educational framework considers: (1) the idiosyncracies of national educational structures in Germany, Norway, and the United States in regions which represent a wide range of rural socioeconomic circumstances,and (2) scholastic rank as a determinant of ambition and as a sorting-out mechanism. Social inequalities resulting from…

  12. Structural and Social Psychological Correlates of Prisonization: A Comparative Analysis.

    ERIC Educational Resources Information Center

    Thomas, Charles W.; And Others

    This study considers some aspects of "prisonization," or the process by which inmates adapt to confinement. Specifically, it further examines two ideas suggested by earlier studies. One is the belief that the structural characteristics of many prisons promote rather than inhibit assimilation into an inmate normative system that is opposed to the…

  13. Comparative structural biology of eubacterial and archaeal oligosaccharyltransferases.

    PubMed

    Maita, Nobuo; Nyirenda, James; Igura, Mayumi; Kamishikiryo, Jun; Kohda, Daisuke

    2010-02-12

    Oligosaccharyltransferase (OST) catalyzes the transfer of an oligosaccharide from a lipid donor to an asparagine residue in nascent polypeptide chains. In the bacterium Campylobacter jejuni, a single-subunit membrane protein, PglB, catalyzes N-glycosylation. We report the 2.8 A resolution crystal structure of the C-terminal globular domain of PglB and its comparison with the previously determined structure from the archaeon Pyrococcus AglB. The two distantly related oligosaccharyltransferases share unexpected structural similarity beyond that expected from the sequence comparison. The common architecture of the putative catalytic sites revealed a new catalytic motif in PglB. Site-directed mutagenesis analyses confirmed the contribution of this motif to the catalytic function. Bacterial PglB and archaeal AglB constitute a protein family of the catalytic subunit of OST along with STT3 from eukaryotes. A structure-aided multiple sequence alignment of the STT3/PglB/AglB protein family revealed three types of OST catalytic centers. This novel classification will provide a useful framework for understanding the enzymatic properties of the OST enzymes from Eukarya, Archaea, and Bacteria.

  14. Comparative Effectiveness of Contextual and Structural Method of Teaching Vocabulary

    ERIC Educational Resources Information Center

    Behlol, Malik; Kaini, Mohammad Munir

    2011-01-01

    The study was conducted to find out effectiveness of contextual an, structural method of teaching vocabulary in English at secondary level. It was an experimental study in which the pretest posttest design was used. The population of the study was the students of secondary classes studying in Government secondary schools of Rawalpindi District.…

  15. The MPI Bioinformatics Toolkit for protein sequence analysis

    PubMed Central

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N.

    2006-01-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at . PMID:16845021

  16. The MPI Bioinformatics Toolkit for protein sequence analysis.

    PubMed

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  17. Computational Lipidomics and Lipid Bioinformatics: Filling In the Blanks.

    PubMed

    Pauling, Josch; Klipp, Edda

    2016-12-22

    Lipids are highly diverse metabolites of pronounced importance in health and disease. While metabolomics is a broad field under the omics umbrella that may also relate to lipids, lipidomics is an emerging field which specializes in the identification, quantification and functional interpretation of complex lipidomes. Today, it is possible to identify and distinguish lipids in a high-resolution, high-throughput manner and simultaneously with a lot of structural detail. However, doing so may produce thousands of mass spectra in a single experiment which has created a high demand for specialized computational support to analyze these spectral libraries. The computational biology and bioinformatics community has so far established methodology in genomics, transcriptomics and proteomics but there are many (combinatorial) challenges when it comes to structural diversity of lipids and their identification, quantification and interpretation. This review gives an overview and outlook on lipidomics research and illustrates ongoing computational and bioinformatics efforts. These efforts are important and necessary steps to advance the lipidomics field alongside analytic, biochemistry, biomedical and biology communities and to close the gap in available computational methodology between lipidomics and other omics sub-branches.

  18. Can bioinformatics help in the identification of moonlighting proteins?

    PubMed

    Hernández, Sergio; Calvo, Alejandra; Ferragut, Gabriela; Franco, Luís; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2014-12-01

    Protein multitasking or moonlighting is the capability of certain proteins to execute two or more unique biological functions. This ability to perform moonlighting functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Usually, moonlighting proteins are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg. It would be helpful if bioinformatics could predict protein multifunctionality, especially because of the large amounts of sequences coming from genome projects. In the present article, we describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. The sequence analysis has been performed: (i) by remote homology searches using PSI-BLAST, (ii) by the detection of functional motifs, and (iii) by the co-evolutionary relationship between amino acids. Programs designed to identify functional motifs/domains are basically oriented to detect the main function, but usually fail in the detection of secondary ones. Remote homology searches such as PSI-BLAST seem to be more versatile in this task, and it is a good complement for the information obtained from protein-protein interaction (PPI) databases. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can be used only in very restricted situations, but can suggest how the evolutionary process of the acquisition of the second function took place.

  19. Bioinformatic characterization of plant networks

    SciTech Connect

    McDermott, Jason E.; Samudrala, Ram

    2008-06-30

    Cells and organisms are governed by networks of interactions, genetic, physical and metabolic. Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  20. Virulence factor activity relationships (VFARs): a bioinformatics perspective.

    PubMed

    Waseem, Hassan; Williams, Maggie R; Stedtfeld, Tiffany; Chai, Benli; Stedtfeld, Robert D; Cole, James R; Tiedje, James M; Hashsham, Syed A

    2017-03-06

    Virulence factor activity relationships (VFARs) - a concept loosely based on quantitative structure-activity relationships (QSARs) for chemicals was proposed as a predictive tool for ranking risks due to microorganisms relevant to water safety. A rapid increase in sequencing capabilities and bioinformatics tools has significantly increased the potential for VFAR-based analyses. This review summarizes more than 20 bioinformatics databases and tools, developed over the last decade, along with their virulence and antimicrobial resistance prediction capabilities. With the number of bacterial whole genome sequences exceeding 241 000 and metagenomic analysis projects exceeding 13 000 and the ability to add additional genome sequences for few hundred dollars, it is evident that further development of VFARs is not limited by the availability of information at least at the genomic level. However, additional information related to co-occurrence, treatment response, modulation of virulence due to environmental and other factors, and economic impact must be gathered and incorporated in a manner that also addresses the associated uncertainties. Of the bioinformatics tools, a majority are either designed exclusively for virulence/resistance determination or equipped with a dedicated module. The remaining have the potential to be employed for evaluating virulence. This review focusing broadly on omics technologies and tools supports the notion that these tools are now sufficiently developed to allow the application of VFAR approaches combined with additional engineering and economic analyses to rank and prioritize organisms important to a given niche. Knowledge gaps do exist but can be filled with focused experimental and theoretical analyses that were unimaginable a decade ago. Further developments should consider the integration of the measurement of activity, risk, and uncertainty to improve the current capabilities.

  1. A toolbox for developing bioinformatics software.

    PubMed

    Rother, Kristian; Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M

    2012-03-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers.

  2. Discovery and Classification of Bioinformatics Web Services

    SciTech Connect

    Rocco, D; Critchlow, T

    2002-09-02

    The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.

  3. A toolbox for developing bioinformatics software

    PubMed Central

    Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M.

    2012-01-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787

  4. Translational bioinformatics applications in genome medicine

    PubMed Central

    2009-01-01

    Although investigators using methodologies in bioinformatics have always been useful in genomic experimentation in analytic, engineering, and infrastructure support roles, only recently have bioinformaticians been able to have a primary scientific role in asking and answering questions on human health and disease. Here, I argue that this shift in role towards asking questions in medicine is now the next step needed for the field of bioinformatics. I outline four reasons why bioinformaticians are newly enabled to drive the questions in primary medical discovery: public availability of data, intersection of data across experiments, commoditization of methods, and streamlined validation. I also list four recommendations for bioinformaticians wishing to get more involved in translational research. PMID:19566916

  5. Genomics and Bioinformatics of Parkinson's Disease

    PubMed Central

    Scholz, Sonja W.; Mhyre, Tim; Ressom, Habtom; Shah, Salim; Federoff, Howard J.

    2012-01-01

    Within the last two decades, genomics and bioinformatics have profoundly impacted our understanding of the molecular mechanisms of Parkinson's disease (PD). From the description of the first PD gene in 1997 until today, we have witnessed the emergence of new technologies that have revolutionized our concepts to identify genetic mechanisms implicated in human health and disease. Driven by the publication of the human genome sequence and followed by the description of detailed maps for common genetic variability, novel applications to rapidly scrutinize the entire genome in a systematic, cost-effective manner have become a reality. As a consequence, about 30 genetic loci have been unequivocally linked to the pathogenesis of PD highlighting essential molecular pathways underlying this common disorder. Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions. PMID:22762024

  6. Machine learning: an indispensable tool in bioinformatics.

    PubMed

    Inza, Iñaki; Calvo, Borja; Armañanzas, Rubén; Bengoetxea, Endika; Larrañaga, Pedro; Lozano, José A

    2010-01-01

    The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.

  7. Why Polyphenols have Promiscuous Actions? An Investigation by Chemical Bioinformatics.

    PubMed

    Tang, Guang-Yan

    2016-05-01

    Despite their diverse pharmacological effects, polyphenols are poor for use as drugs, which have been traditionally ascribed to their low bioavailability. However, Baell and co-workers recently proposed that the redox potential of polyphenols also plays an important role in this, because redox reactions bring promiscuous actions on various protein targets and thus produce non-specific pharmacological effects. To investigate whether the redox reactivity behaves as a critical factor in polyphenol promiscuity, we performed a chemical bioinformatics analysis on the structure-activity relationships of twenty polyphenols. It was found that the gene expression profiles of human cell lines induced by polyphenols were not correlated with the presence or not of redox moieties in the polyphenols, but significantly correlated with their molecular structures. Therefore, it is concluded that the promiscuous actions of polyphenols are likely to result from their inherent structural features rather than their redox potential.

  8. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  9. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  10. [Applied problems of mathematical biology and bioinformatics].

    PubMed

    Lakhno, V D

    2011-01-01

    Mathematical biology and bioinformatics represent a new and rapidly progressing line of investigations which emerged in the course of work on the project "Human genome". The main applied problems of these sciences are grug design, patient-specific medicine and nanobioelectronics. It is shown that progress in the technology of mass sequencing of the human genome has set the stage for starting the national program on patient-specific medicine.

  11. A library-based bioinformatics services program*

    PubMed Central

    Yarfitz, Stuart; Ketchell, Debra S.

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise. PMID:10658962

  12. Bringing Web 2.0 to bioinformatics

    PubMed Central

    Zhang, Zhang; Cheung, Kei-Hoi

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies. PMID:18842678

  13. Bringing Web 2.0 to bioinformatics.

    PubMed

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  14. Bioinformatics tools for analysing viral genomic data.

    PubMed

    Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J

    2016-04-01

    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing.

  15. Bioinformatics on the Cloud Computing Platform Azure

    PubMed Central

    Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  16. Translational bioinformatics in psychoneuroimmunology: methods and applications.

    PubMed

    Yan, Qing

    2012-01-01

    Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease.

  17. Quantum Bio-Informatics IV

    NASA Astrophysics Data System (ADS)

    Accardi, Luigi; Freudenberg, Wolfgang; Ohya, Masanori

    2011-01-01

    .Use of cryptographic ideas to interpret biological phenomena (and vice versa) / M. Regoli -- Discrete approximation to operators in white noise analysis / Si Si -- Bogoliubov type equations via infinite-dimensional equations for measures / V. V. Kozlov and O. G. Smolyanov -- Analysis of several categorical data using measure of proportional reduction in variation / K. Yamamoto ... [et al.] -- The electron reservoir hypothesis for two-dimensional electron systems / K. Yamada ... [et al.] -- On the correspondence between Newtonian and functional mechanics / E. V. Piskovskiy and I. V. Volovich -- Quantile-quantile plots: An approach for the inter-species comparison of promoter architecture in eukaryotes / K. Feldmeier ... [et al.] -- Entropy type complexities in quantum dynamical processes / N. Watanabe -- A fair sampling test for Ekert protocol / G. Adenier, A. Yu. Khrennikov and N. Watanabe -- Brownian dynamics simulation of macromolecule diffusion in a protocell / T. Ando and J. Skolnick -- Signaling network of environmental sensing and adaptation in plants: Key roles of calcium ion / K. Kuchitsu and T. Kurusu -- NetzCope: A tool for displaying and analyzing complex networks / M. J. Barber, L. Streit and O. Strogan -- Study of HIV-1 evolution by coding theory and entropic chaos degree / K. Sato -- The prediction of botulinum toxin structure based on in silico and in vitro analysis / T. Suzuki and S. Miyazaki -- On the mechanism of D-wave high T[symbol] superconductivity by the interplay of Jahn-Teller physics and Mott physics / H. Ushio, S. Matsuno and H. Kamimura.

  18. Broader incorporation of bioinformatics in education: opportunities and challenges.

    PubMed

    Cummings, Michael P; Temple, Glena G

    2010-11-01

    The major opportunities for broader incorporation of bioinformatics in education can be placed into three general categories: general applicability of bioinformatics in life science and related curricula; inherent fit of bioinformatics for promoting student learning in most biology programs; and the general experience and associated comfort students have with computers and technology. Conversely, the major challenges for broader incorporation of bioinformatics in education can be placed into three general categories: required infrastructure and logistics; instructor knowledge of bioinformatics and continuing education; and the breadth of bioinformatics, and the diversity of students and educational objectives. Broader incorporation of bioinformatics at all education levels requires overcoming the challenges to using transformative computer-requiring learning activities, assisting faculty in collecting assessment data on mastery of student learning outcomes, as well as creating more faculty development opportunities that span diverse skill levels, with an emphasis placed on providing resource materials that are kept up-to-date as the field and tools change.

  19. Comparative Evaluation of Different Optimization Algorithms for Structural Design Applications

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.

    1996-01-01

    Non-linear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Centre, a project was initiated to assess the performance of eight different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using the eight different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems, however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with Sequential Unconstrained Minimizations Technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.

  20. Bioinformatics for transporter pharmacogenomics and systems biology: data integration and modeling with UML.

    PubMed

    Yan, Qing

    2010-01-01

    Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.

  1. Bioinformatics and Multiepitope DNA Immunization to Design Rational Snake Antivenom

    PubMed Central

    Wagstaff, Simon C; Laing, Gavin D; Theakston, R. David G; Papaspyridis, Christina; Harrison, Robert A

    2006-01-01

    Background Snake venom is a potentially lethal and complex mixture of hundreds of functionally diverse proteins that are difficult to purify and hence difficult to characterize. These difficulties have inhibited the development of toxin-targeted therapy, and conventional antivenom is still generated from the sera of horses or sheep immunized with whole venom. Although life-saving, antivenoms contain an immunoglobulin pool of unknown antigen specificity and known redundancy, which necessitates the delivery of large volumes of heterologous immunoglobulin to the envenomed victim, thus increasing the risk of anaphylactoid and serum sickness adverse effects. Here we exploit recent molecular sequence analysis and DNA immunization tools to design more rational toxin-targeted antivenom. Methods and Findings We developed a novel bioinformatic strategy that identified sequences encoding immunogenic and structurally significant epitopes from an expressed sequence tag database of a venom gland cDNA library of Echis ocellatus, the most medically important viper in Africa. Focusing upon snake venom metalloproteinases (SVMPs) that are responsible for the severe and frequently lethal hemorrhage in envenomed victims, we identified seven epitopes that we predicted would be represented in all isomers of this multimeric toxin and that we engineered into a single synthetic multiepitope DNA immunogen (epitope string). We compared the specificity and toxin-neutralizing efficacy of antiserum raised against the string to antisera raised against a single SVMP toxin (or domains) or antiserum raised by conventional (whole venom) immunization protocols. The SVMP string antiserum, as predicted in silico, contained antibody specificities to numerous SVMPs in E. ocellatus venom and venoms of several other African vipers. More significantly, the antiserum cross-specifically neutralized hemorrhage induced by E. ocellatus and Cerastes cerastes cerastes venoms. Conclusions These data provide valuable

  2. Entropyology: the application of bioinformatics and data modeling to digital virus and malware recognition

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger M.; Handley, James W.

    2010-04-01

    Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.

  3. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data.

    PubMed

    Roumpeka, Despoina D; Wallace, R John; Escalettes, Frank; Fotheringham, Ian; Watson, Mick

    2017-01-01

    The microbiome can be defined as the community of microorganisms that live in a particular environment. Metagenomics is the practice of sequencing DNA from the genomes of all organisms present in a particular sample, and has become a common method for the study of microbiome population structure and function. Increasingly, researchers are finding novel genes encoded within metagenomes, many of which may be of interest to the biotechnology and pharmaceutical industries. However, such "bioprospecting" requires a suite of sophisticated bioinformatics tools to make sense of the data. This review summarizes the most commonly used bioinformatics tools for the assembly and annotation of metagenomic sequence data with the aim of discovering novel genes.

  4. A Brief Review of Bioinformatics Tools for Glycosylation Analysis by Mass Spectrometry

    PubMed Central

    Tsai, Pei-Lun; Chen, Sung-Fang

    2017-01-01

    The purpose of this review is to provide updated information regarding bioinformatic software for the use in the characterization of glycosylated structures since 2013. A comprehensive review by Woodin et al. Analyst 138: 2793–2803, 2013 (ref. 1) described two main approaches that are introduced for starting researchers in this area; analysis of released glycans and the identification of glycopeptide in enzymatic digests, respectively. Complementary to that report, this review focuses on mass spectrometry related bioinformatics tools for the characterization of N-linked and O-linked glycopeptides. Specifically, it also provides information regarding automated tools that can be used for glycan profiling using mass spectrometry. PMID:28337402

  5. Comparing connected structures in ensemble of random fields

    NASA Astrophysics Data System (ADS)

    Rongier, Guillaume; Collon, Pauline; Renard, Philippe; Straubhaar, Julien; Sausse, Judith

    2016-10-01

    Very different connectivity patterns may arise from using different simulation methods or sets of parameters, and therefore different flow properties. This paper proposes a systematic method to compare ensemble of categorical simulations from a static connectivity point of view. The differences of static connectivity cannot always be distinguished using two point statistics. In addition, multiple-point histograms only provide a statistical comparison of patterns regardless of the connectivity. Thus, we propose to characterize the static connectivity from a set of 12 indicators based on the connected components of the realizations. Some indicators describe the spatial repartition of the connected components, others their global shape or their topology through the component skeletons. We also gather all the indicators into dissimilarity values to easily compare hundreds of realizations. Heat maps and multidimensional scaling then facilitate the dissimilarity analysis. The application to a synthetic case highlights the impact of the grid size on the connectivity and the indicators. Such impact disappears when comparing samples of the realizations with the same sizes. The method is then able to rank realizations from a referring model based on their static connectivity. This application also gives rise to more practical advices. The multidimensional scaling appears as a powerful visualization tool, but it also induces dissimilarity misrepresentations: it should always be interpreted cautiously with a look at the point position confidence. The heat map displays the real dissimilarities and is more appropriate for a detailed analysis. The comparison with a multiple-point histogram method shows the benefit of the connected components: the large-scale connectivity seems better characterized by our indicators, especially the skeleton indicators.

  6. Experimental and bioinformatic approaches for interrogating protein-protein interactions to determine protein function.

    PubMed

    Droit, Arnaud; Poirier, Guy G; Hunter, Joanna M

    2005-04-01

    An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. One strategy to determine protein function is to identify the protein-protein interactions. The increasing use of high-throughput and large-scale bioinformatics-based studies has generated a massive amount of data stored in a number of different databases. A challenge for bioinformatics is to explore this disparate data and to uncover biologically relevant interactions and pathways. In parallel, there is clearly a need for the development of approaches that can predict novel protein-protein interaction networks in silico. Here, we present an overview of different experimental and bioinformatic methods to elucidate protein-protein interactions.

  7. 2016 update on APBioNet's annual international conference on bioinformatics (InCoB).

    PubMed

    Schönbach, Christian; Verma, Chandra; Wee, Lawrence Jin Kiat; Bond, Peter John; Ranganathan, Shoba

    2016-12-22

    InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.

  8. Associations between Input and Outcome Variables in an Online High School Bioinformatics Instructional Program

    NASA Astrophysics Data System (ADS)

    Lownsbery, Douglas S.

    Quantitative data from a completed year of an innovative online high school bioinformatics instructional program were analyzed as part of a descriptive research study. The online instructional program provided the opportunity for high school students to develop content understandings of molecular genetics and to use sophisticated bioinformatics tools and methodologies to conduct authentic research. Quantitative data were analyzed to identify potential associations between independent program variables including implementation setting, gender, and student educational backgrounds and dependent variables indicating success in the program including completion rates for analyzing DNA clones and performance gains from pre-to-post assessments of bioinformatics knowledge. Study results indicate that understanding associations between student educational backgrounds and level of success may be useful for structuring collaborative learning groups and enhancing scaffolding and support during the program to promote higher levels of success for participating students.

  9. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  10. MAPI: towards the integrated exploitation of bioinformatics Web Services

    PubMed Central

    2011-01-01

    Background Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. Results To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. Conclusions The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others). PMID:22032807

  11. Agonist Binding to Chemosensory Receptors: A Systematic Bioinformatics Analysis

    PubMed Central

    Fierro, Fabrizio; Suku, Eda; Alfonso-Prieto, Mercedes; Giorgetti, Alejandro; Cichon, Sven; Carloni, Paolo

    2017-01-01

    Human G-protein coupled receptors (hGPCRs) constitute a large and highly pharmaceutically relevant membrane receptor superfamily. About half of the hGPCRs' family members are chemosensory receptors, involved in bitter taste and olfaction, along with a variety of other physiological processes. Hence these receptors constitute promising targets for pharmaceutical intervention. Molecular modeling has been so far the most important tool to get insights on agonist binding and receptor activation. Here we investigate both aspects by bioinformatics-based predictions across all bitter taste and odorant receptors for which site-directed mutagenesis data are available. First, we observe that state-of-the-art homology modeling combined with previously used docking procedures turned out to reproduce only a limited fraction of ligand/receptor interactions inferred by experiments. This is most probably caused by the low sequence identity with available structural templates, which limits the accuracy of the protein model and in particular of the side-chains' orientations. Methods which transcend the limited sampling of the conformational space of docking may improve the predictions. As an example corroborating this, we review here multi-scale simulations from our lab and show that, for the three complexes studied so far, they significantly enhance the predictive power of the computational approach. Second, our bioinformatics analysis provides support to previous claims that several residues, including those at positions 1.50, 2.50, and 7.52, are involved in receptor activation. PMID:28932739

  12. Comparative study on the topological structure of China Education Network

    NASA Astrophysics Data System (ADS)

    Yu, Ming-Min; Zhang, Ning; Mao, Guo-Yong

    2017-07-01

    China Education Network (CEN) of year 2014 was studied as a complex network object. By searching the domain of “.edu.cn” and filtering some unexpected results, we finally get a network with 14,100,628 pages and 213,513,401 links. The topology of this network was studied to get the features such as out-degree distribution, in-degree distribution and average shortest path length. These features were compared with that of year 2007 and 2004 to observe the evolution mechanisms of CEN. According to the statistical results, it is found that some topology features of CEN such as out-degree distribution, in-degree distribution and average shortest path have changed a lot and the related reasons for these changes are given in this paper.

  13. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP

    ERIC Educational Resources Information Center

    Medin, Carey L.; Nolin, Katie L.

    2011-01-01

    Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…

  15. CattleTickBase: An integrated Internet-based bioinformatics resource for Rhipicephalus (Boophilus) microplus

    USDA-ARS?s Scientific Manuscript database

    The Rhipicephalus microplus genome is large and complex in structure, making a genome sequence difficult to assemble and costly to resource the required bioinformatics. In light of this, a consortium of international collaborators was formed to pool resources to begin sequencing this genome. We have...

  16. A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP

    ERIC Educational Resources Information Center

    Medin, Carey L.; Nolin, Katie L.

    2011-01-01

    Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…

  17. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  18. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  19. Hospital profitability and capital structure: a comparative analysis.

    PubMed Central

    Valvona, J; Sloan, F A

    1988-01-01

    This article compares the financial performance of hospitals by ownership type and of five publicly traded hospital companies with other industries, using such indicators as profit margins, return on equity (ROE) and total capitalization, and debt-to-equity ratios. We also examine stock returns to investors for the five hospital companies versus other industries, as well as the relative roles of debt and equity in new financing. Investor-owned hospitals had substantially greater margins and ROE than did other hospital types. In 1982, investor-owned chain hospitals had a ROE of 26 percent, 18 points above the average for all hospitals. Stock returns on the five selected hospital companies were more than twice as large as returns on other industries between 1972 and 1983. However, after 1983, returns for these companies fell dramatically in absolute terms and relative to other industries. We also found investor-owned hospitals to be much more highly levered than their government and voluntary counterparts, and more highly levered than other industries as well. PMID:3403274

  20. Comparative jet wake structure and swimming performance of salps.

    PubMed

    Sutherland, Kelly R; Madin, Laurence P

    2010-09-01

    Salps are barrel-shaped marine invertebrates that swim by jet propulsion. Morphological variations among species and life-cycle stages are accompanied by differences in swimming mode. The goal of this investigation was to compare propulsive jet wakes and swimming performance variables among morphologically distinct salp species (Pegea confoederata, Weelia (Salpa) cylindrica, Cyclosalpa sp.) and relate swimming patterns to ecological function. Using a combination of in situ dye visualization and particle image velocimetry (PIV) measurements, we describe properties of the jet wake and swimming performance variables including thrust, drag and propulsive efficiency. Locomotion by all species investigated was achieved via vortex ring propulsion. The slow-swimming P. confoederata produced the highest weight-specific thrust (T=53 N kg(-1)) and swam with the highest whole-cycle propulsive efficiency (eta(wc)=55%). The fast-swimming W. cylindrica had the most streamlined body shape but produced an intermediate weight-specific thrust (T=30 N kg(-1)) and swam with an intermediate whole-cycle propulsive efficiency (eta(wc)=52%). Weak swimming performance variables in the slow-swimming C. affinis, including the lowest weight-specific thrust (T=25 N kg(-1)) and lowest whole-cycle propulsive efficiency (eta(wc)=47%), may be compensated by low energetic requirements. Swimming performance variables are considered in the context of ecological roles and evolutionary relationships.

  1. A comparison of common programming languages used in bioinformatics

    PubMed Central

    Fourment, Mathieu; Gillings, Michael R

    2008-01-01

    Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993

  2. mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking

    PubMed Central

    Bokulich, Nicholas A.; Rideout, Jai Ram; Mercurio, William G.; Shiffer, Arron; Wolfe, Benjamin; Maurice, Corinne F.; Dutton, Rachel J.; Turnbaugh, Peter J.; Knight, Rob

    2016-01-01

    ABSTRACT Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community. PMID:27822553

  3. mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

    PubMed

    Bokulich, Nicholas A; Rideout, Jai Ram; Mercurio, William G; Shiffer, Arron; Wolfe, Benjamin; Maurice, Corinne F; Dutton, Rachel J; Turnbaugh, Peter J; Knight, Rob; Caporaso, J Gregory

    2016-01-01

    Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.

  4. [From bioinformatics to systems biology: account of the 12th international conference on intelligent systems in molecular biology].

    PubMed

    Ivakhno, S S

    2004-01-01

    The paper reviews the 12th International Conference on Intelligent Systems for Molecular Biology/Third European Conference on Computational Biology 2004 that was held in Glasgow, UK, during July 31-August 4. A number of talks, papers and software demos from the conference in bioinformatics, genomics, proteomics, transcriptomics and systems biology are described. Recent applications of liquid chromatography - tandem mass spectrometry, comparative genomics and DNA microarrays are given along with the discussion of bioinformatics curricular in higher education.

  5. Modern bioinformatics meets traditional Chinese medicine.

    PubMed

    Gu, Peiqin; Chen, Huajun

    2014-11-01

    Traditional Chinese medicine (TCM) is gaining increasing attention with the emergence of integrative medicine and personalized medicine, characterized by pattern differentiation on individual variance and treatments based on natural herbal synergism. Investigating the effectiveness and safety of the potential mechanisms of TCM and the combination principles of drug therapies will bridge the cultural gap with Western medicine and improve the development of integrative medicine. Dealing with rapidly growing amounts of biomedical data and their heterogeneous nature are two important tasks among modern biomedical communities. Bioinformatics, as an emerging interdisciplinary field of computer science and biology, has become a useful tool for easing the data deluge pressure by automating the computation processes with informatics methods. Using these methods to retrieve, store and analyze the biomedical data can effectively reveal the associated knowledge hidden in the data, and thus promote the discovery of integrated information. Recently, these techniques of bioinformatics have been used for facilitating the interactional effects of both Western medicine and TCM. The analysis of TCM data using computational technologies provides biological evidence for the basic understanding of TCM mechanisms, safety and efficacy of TCM treatments. At the same time, the carrier and targets associated with TCM remedies can inspire the rethinking of modern drug development. This review summarizes the significant achievements of applying bioinformatics techniques to many aspects of the research in TCM, such as analysis of TCM-related '-omics' data and techniques for analyzing biological processes and pharmaceutical mechanisms of TCM, which have shown certain potential of bringing new thoughts to both sides. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  6. Translational Bioinformatics Approaches to Drug Development

    PubMed Central

    Readhead, Ben; Dudley, Joel

    2013-01-01

    Significance A majority of therapeutic interventions occur late in the pathological process, when treatment outcome can be less predictable and effective, highlighting the need for new precise and preventive therapeutic development strategies that consider genomic and environmental context. Translational bioinformatics is well positioned to contribute to the many challenges inherent in bridging this gap between our current reactive methods of healthcare delivery and the intent of precision medicine, particularly in the areas of drug development, which forms the focus of this review. Recent Advances A variety of powerful informatics methods for organizing and leveraging the vast wealth of available molecular measurements available for a broad range of disease contexts have recently emerged. These include methods for data driven disease classification, drug repositioning, identification of disease biomarkers, and the creation of disease network models, each with significant impacts on drug development approaches. Critical Issues An important bottleneck in the application of bioinformatics methods in translational research is the lack of investigators who are versed in both biomedical domains and informatics. Efforts to nurture both sets of competencies within individuals and to increase interfield visibility will help to accelerate the adoption and increased application of bioinformatics in translational research. Future Directions It is possible to construct predictive, multiscale network models of disease by integrating genotype, gene expression, clinical traits, and other multiscale measures using causal network inference methods. This can enable the identification of the “key drivers” of pathology, which may represent novel therapeutic targets or biomarker candidates that play a more direct role in the etiology of disease. PMID:24527359

  7. Implementing bioinformatic workflows within the bioextract server.

    PubMed

    Lushbough, Carol M; Bergman, Michael K; Lawrence, Carolyn J; Jennewein, Doug; Brendel, Volker

    2008-01-01

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed service designed to provide researchers with the web ability to query multiple data sources, save results as searchable data sets, and execute analytic tools. As the researcher works with the system, their tasks are saved in the background. At any time these steps can be saved as a workflow that can then be executed again and/or modified later.

  8. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  9. Critical Issues in Bioinformatics and Computing

    PubMed Central

    Kesh, Someswa; Raghupathi, Wullianallur

    2004-01-01

    This article provides an overview of the field of bioinformatics and its implications for the various participants. Next-generation issues facing developers (programmers), users (molecular biologists), and the general public (patients) who would benefit from the potential applications are identified. The goal is to create awareness and debate on the opportunities (such as career paths) and the challenges such as privacy that arise. A triad model of the participants' roles and responsibilities is presented along with the identification of the challenges and possible solutions. PMID:18066389

  10. Mobyle: a new full web bioinformatics framework

    PubMed Central

    Néron, Bertrand; Ménager, Hervé; Maufrais, Corinne; Joly, Nicolas; Maupetit, Julien; Letort, Sébastien; Carrere, Sébastien; Tuffery, Pierre; Letondal, Catherine

    2009-01-01

    Motivation: For the biologist, running bioinformatics analyses involves a time-consuming management of data and tools. Users need support to organize their work, retrieve parameters and reproduce their analyses. They also need to be able to combine their analytic tools using a safe data flow software mechanism. Finally, given that scientific tools can be difficult to install, it is particularly helpful for biologists to be able to use these tools through a web user interface. However, providing a web interface for a set of tools raises the problem that a single web portal cannot offer all the existing and possible services: it is the user, again, who has to cope with data copy among a number of different services. A framework enabling portal administrators to build a network of cooperating services would therefore clearly be beneficial. Results: We have designed a system, Mobyle, to provide a flexible and usable Web environment for defining and running bioinformatics analyses. It embeds simple yet powerful data management features that allow the user to reproduce analyses and to combine tools using a hierarchical typing system. Mobyle offers invocation of services distributed over remote Mobyle servers, thus enabling a federated network of curated bioinformatics portals without the user having to learn complex concepts or to install sophisticated software. While being focused on the end user, the Mobyle system also addresses the need, for the bioinfomatician, to automate remote services execution: PlayMOBY is a companion tool that automates the publication of BioMOBY web services, using Mobyle program definitions. Availability: The Mobyle system is distributed under the terms of the GNU GPLv2 on the project web site (http://bioweb2.pasteur.fr/projects/mobyle/). It is already deployed on three servers: http://mobyle.pasteur.fr, http://mobyle.rpbs.univ-paris-diderot.fr and http://lipm-bioinfo.toulouse.inra.fr/Mobyle. The PlayMOBY companion is distributed under the

  11. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  12. Translational Bioinformatics: Past, Present, and Future

    PubMed Central

    Tenenbaum, Jessica D.

    2016-01-01

    Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field. PMID:26876718

  13. Microbial bioinformatics for food safety and production

    PubMed Central

    Alkema, Wynand; Boekhorst, Jos; Wels, Michiel

    2016-01-01

    In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. PMID:26082168

  14. Microbial bioinformatics for food safety and production.

    PubMed

    Alkema, Wynand; Boekhorst, Jos; Wels, Michiel; van Hijum, Sacha A F T

    2016-03-01

    In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput 'omics' technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety.

  15. Multiobjective optimization in bioinformatics and computational biology.

    PubMed

    Handl, Julia; Kell, Douglas B; Knowles, Joshua

    2007-01-01

    This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts," giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.

  16. Teaching the ABCs of bioinformatics: a brief introduction to the Applied Bioinformatics Course

    PubMed Central

    2014-01-01

    With the development of the Internet and the growth of online resources, bioinformatics training for wet-lab biologists became necessary as a part of their education. This article describes a one-semester course ‘Applied Bioinformatics Course’ (ABC, http://abc.cbi.pku.edu.cn/) that the author has been teaching to biological graduate students at the Peking University and the Chinese Academy of Agricultural Sciences for the past 13 years. ABC is a hands-on practical course to teach students to use online bioinformatics resources to solve biological problems related to their ongoing research projects in molecular biology. With a brief introduction to the background of the course, detailed information about the teaching strategies of the course are outlined in the ‘How to teach’ section. The contents of the course are briefly described in the ‘What to teach’ section with some real examples. The author wishes to share his teaching experiences and the online teaching materials with colleagues working in bioinformatics education both in local and international universities. PMID:24008274

  17. Bioinformatics for cancer immunology and immunotherapy.

    PubMed

    Charoentong, Pornpimol; Angelova, Mihaela; Efremova, Mirjana; Gallasch, Ralf; Hackl, Hubert; Galon, Jerome; Trajanoski, Zlatko

    2012-11-01

    Recent mechanistic insights obtained from preclinical studies and the approval of the first immunotherapies has motivated increasing number of academic investigators and pharmaceutical/biotech companies to further elucidate the role of immunity in tumor pathogenesis and to reconsider the role of immunotherapy. Additionally, technological advances (e.g., next-generation sequencing) are providing unprecedented opportunities to draw a comprehensive picture of the tumor genomics landscape and ultimately enable individualized treatment. However, the increasing complexity of the generated data and the plethora of bioinformatics methods and tools pose considerable challenges to both tumor immunologists and clinical oncologists. In this review, we describe current concepts and future challenges for the management and analysis of data for cancer immunology and immunotherapy. We first highlight publicly available databases with specific focus on cancer immunology including databases for somatic mutations and epitope databases. We then give an overview of the bioinformatics methods for the analysis of next-generation sequencing data (whole-genome and exome sequencing), epitope prediction tools as well as methods for integrative data analysis and network modeling. Mathematical models are powerful tools that can predict and explain important patterns in the genetic and clinical progression of cancer. Therefore, a survey of mathematical models for tumor evolution and tumor-immune cell interaction is included. Finally, we discuss future challenges for individualized immunotherapy and suggest how a combined computational/experimental approaches can lead to new insights into the molecular mechanisms of cancer, improved diagnosis, and prognosis of the disease and pinpoint novel therapeutic targets.

  18. Tools and collaborative environments for bioinformatics research

    PubMed Central

    Giugno, Rosalba; Pulvirenti, Alfredo

    2011-01-01

    Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies. PMID:21984743

  19. ExPASy: SIB bioinformatics resource portal.

    PubMed

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-07-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

  20. OpenHelix: bioinformatics education outside of a different box

    PubMed Central

    Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.

    2010-01-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  1. OpenHelix: bioinformatics education outside of a different box.

    PubMed

    Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C

    2010-11-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.

  2. Translational bioinformatics: linking the molecular world to the clinical world.

    PubMed

    Altman, R B

    2012-06-01

    Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.

  3. BioJava: an open-source framework for bioinformatics.

    PubMed

    Holland, R C G; Down, T A; Pocock, M; Prlić, A; Huen, D; James, K; Foisy, S; Dräger, A; Yates, A; Heuer, M; Schreiber, M J

    2008-09-15

    BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.5 or higher. All queries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.

  4. Data Compression Concepts and Algorithms and their Applications to Bioinformatics

    PubMed Central

    Nalbantog̃lu, Ö. U.; Russell, D.J.; Sayood, K.

    2009-01-01

    Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences. PMID:20157640

  5. Bioinformatics in crosslinking chemistry of collagen with selective cross linkers

    PubMed Central

    2011-01-01

    Background Identifying the molecular interactions using bioinformatics tools before venturing into wet lab studies saves the energy and time considerably. The present study summarizes, molecular interactions and binding energy calculations made for major structural protein, collagen of Type I and Type III with the chosen cross-linkers, namely, coenzyme Q10, dopaquinone, embelin, embelin complex-1 & 2, idebenone, 5-O-methyl embelin, potassium embelate and vilangin. Results Molecular descriptive analyses suggest, dopaquinone, embelin, idebenone, 5-O-methyl embelin, and potassium embelate display nil violations. And results of docking analyses revealed, best affinity for Type I (- 4.74 kcal/mol) and type III (-4.94 kcal/mol) collagen was with dopaquinone. Conclusions Among the selected cross-linkers, dopaquinone, embelin, potassium embelate and 5-O-methyl embelin were the suitable cross-linkers for both Type I and Type III collagen and stabilizes the collagen at the expected level. PMID:21989371

  6. Comparative analysis of mt LSU rRNA secondary structures of Odonates: structural variability and phylogenetic signal.

    PubMed

    Misof, B; Fleck, G

    2003-12-01

    Secondary structures of the most conserved part of the mt 16S rRNA gene, domains IV and V, have been recently analysed in a comparative study. However, full secondary structures of the mt LSU rRNA molecule are published for only a few insect species. The present study presents full secondary structures of domains I, II, IV and V of Odonates and one representative of mayflies, Ephemera sp. The reconstructions are based on a comparative approach and minimal consensus structures derived from sequence alignments. The inferred structures exhibit remarkable similarities to the published Drosophila melanogaster model, which increases confidence in these structures. Structural variance within Odonates is homoplastic, and neighbour-joining trees based on tree edit distances do not correspond to any of the phylogenetically expected patterns. However, despite homoplastic quantitative structural variation, many similarities between Odonates and Ephemera sp. suggest promising character sets for higher order insect systematics that merit further investigations.

  7. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  8. Exploring DNA Structure with Cn3D

    PubMed Central

    Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca

    2007-01-01

    Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology education. Over the past years, we have been working to help biology instructors introduce bioinformatics activities into their curricula by providing them with instructional materials that use bioinformatics programs and databases as educational tools. In this study, we measured the impact of a set of these materials on student learning. The activities in these materials asked students to use the molecular structure visualization program Cn3D to locate, identify, or analyze diverse features in DNA structures. Both the experimental groups of college and high school students showed significant increases in learning relative to control groups. Further, learning gains by the college students were correlated with the number of activities assigned. We conclude that working with Cn3D was important for improving student understanding of DNA structure. This study is one example of how a bioinformatics program for visualization can be used to support student learning. PMID:17339395

  9. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.

    PubMed

    Heo, Go Eun; Kang, Keun Young; Song, Min; Lee, Jeong-Hoon

    2017-05-31

    Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are

  10. Developing sustainable software solutions for bioinformatics by the " Butterfly" paradigm.

    PubMed

    Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas

    2014-01-01

    Software design and sustainable software engineering are essential for the long-term development of bioinformatics software. Typical challenges in an academic environment are short-term contracts, island solutions, pragmatic approaches and loose documentation. Upcoming new challenges are big data, complex data sets, software compatibility and rapid changes in data representation. Our approach to cope with these challenges consists of iterative intertwined cycles of development (" Butterfly" paradigm) for key steps in scientific software engineering. User feedback is valued as well as software planning in a sustainable and interoperable way. Tool usage should be easy and intuitive. A middleware supports a user-friendly Graphical User Interface (GUI) as well as a database/tool development independently. We validated the approach of our own software development and compared the different design paradigms in various software solutions.

  11. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics.

    PubMed

    Siegwald, Léa; Touzet, Hélène; Lemoine, Yves; Hot, David; Audebert, Christophe; Caboche, Ségolène

    2017-01-01

    Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described.

  12. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics

    PubMed Central

    Siegwald, Léa; Touzet, Hélène; Lemoine, Yves; Hot, David

    2017-01-01

    Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described. PMID:28052134

  13. Comparative modeling: the state of the art and protein drug target structure prediction.

    PubMed

    Liu, Tianyun; Tang, Grace W; Capriotti, Emidio

    2011-07-01

    The goal of computational protein structure prediction is to provide three-dimensional (3D) structures with resolution comparable to experimental results. Comparative modeling, which predicts the 3D structure of a protein based on its sequence similarity to homologous structures, is the most accurate computational method for structure prediction. In the last two decades, significant progress has been made on comparative modeling methods. Using the large number of protein structures deposited in the Protein Data Bank (~65,000), automatic prediction pipelines are generating a tremendous number of models (~1.9 million) for sequences whose structures have not been experimentally determined. Accurate models are suitable for a wide range of applications, such as prediction of protein binding sites, prediction of the effect of protein mutations, and structure-guided virtual screening. In particular, comparative modeling has enabled structure-based drug design against protein targets with unknown structures. In this review, we describe the theoretical basis of comparative modeling, the available automatic methods and databases, and the algorithms to evaluate the accuracy of predicted structures. Finally, we discuss relevant applications in the prediction of important drug target proteins, focusing on the G protein-coupled receptor (GPCR) and protein kinase families.

  14. Wrapping and interoperating bioinformatics resources using CORBA.

    PubMed

    Stevens, R; Miller, C

    2000-02-01

    Bioinformaticians seeking to provide services to working biologists are faced with the twin problems of distribution and diversity of resources. Bioinformatics databases are distributed around the world and exist in many kinds of storage forms, platforms and access paradigms. To provide adequate services to biologists, these distributed and diverse resources have to interoperate seamlessly within single applications. The Common Object Request Broker Architecture (CORBA) offers one technical solution to these problems. The key component of CORBA is its use of object orientation as an intermediate form to translate between different representations. This paper concentrates on an explanation of object orientation and how it can be used to overcome the problems of distribution and diversity by describing the interfaces between objects.

  15. Bioinformatics Resources for MicroRNA Discovery

    PubMed Central

    Moore, Alyssa C.; Winkjer, Jonathan S.; Tseng, Tsai-Tien

    2015-01-01

    Biomarker identification is often associated with the diagnosis and evaluation of various diseases. Recently, the role of microRNA (miRNA) has been implicated in the development of diseases, particularly cancer. With the advent of next-generation sequencing, the amount of data on miRNA has increased tremendously in the last decade, requiring new bioinformatics approaches for processing and storing new information. New strategies have been developed in mining these sequencing datasets to allow better understanding toward the actions of miRNAs. As a result, many databases have also been established to disseminate these findings. This review focuses on several curated databases of miRNAs and their targets from both predicted and validated sources. PMID:26819547

  16. Reliability-oriented bioinformatic networks visualization.

    PubMed

    Aladağ, Ahmet Emre; Erten, Cesim; Sözdinler, Melih

    2011-06-01

    We present our protein-protein interaction (PPI) network visualization system RobinViz (reliability-oriented bioinformatic networks visualization). Clustering the PPI network based on gene ontology (GO) annotations or biclustered gene expression data, providing a clustered visualization model based on a central/peripheral duality, computing layouts with algorithms specialized for interaction reliabilities represented as weights, completely automated data acquisition, processing are notable features of the system. RobinViz is a free, open-source software protected under GPL. It is written in C++ and Python, and consists of almost 30 000 lines of code, excluding the employed libraries. Source code, user manual and other Supplementary Material are available for download at http://code.google.com/p/robinviz/.

  17. Survey: Translational Bioinformatics embraces Big Data

    PubMed Central

    Shah, Nigam H.

    2015-01-01

    Summary We review the latest trends and major developments in translational bioinformatics in the year 2011–2012. Our emphasis is on highlighting the key events in the field and pointing at promising research areas for the future. The key take-home points are: Translational informatics is ready to revolutionize human health and healthcare using large-scale measurements on individuals.Data–centric approaches that compute on massive amounts of data (often called “Big Data”) to discover patterns and to make clinically relevant predictions will gain adoption.Research that bridges the latest multimodal measurement technologies with large amounts of electronic healthcare data is increasing; and is where new breakthroughs will occur. PMID:22890354

  18. The European Bioinformatics Institute's data resources.

    PubMed

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2010-01-01

    The wide uptake of next-generation sequencing and other ultra-high throughput technologies by life scientists with a diverse range of interests, spanning fundamental biological research, medicine, agriculture and environmental science, has led to unprecedented growth in the amount of data generated. It has also put the need for unrestricted access to biological data at the centre of biology. The European Bioinformatics Institute (EMBL-EBI) is unique in Europe and is one of only two organisations worldwide providing access to a comprehensive, integrated set of these collections. Here, we describe how the EMBL-EBI's biomolecular databases are evolving to cope with increasing levels of submission, a growing and diversifying user base, and the demand for new types of data. All of the resources described here can be accessed from the EMBL-EBI website: http://www.ebi.ac.uk.

  19. The Effectiveness of Structured Input and Structured Output on the Acquisition of Japanese Comparative Sentences

    ERIC Educational Resources Information Center

    Yamashita, Taichi; Iizuka, Takehiro

    2017-01-01

    Discussion of the roles of input and output has been attracting a number of researchers in second language acquisition (e.g., DeKeyser, 2007; Doughty, 1991; Krashen, 1982; Long, 1983; Norris & Ortega, 2000; Swain, 2000), and VanPatten (2004) advocated that both structured input and structured output allow learners to process input properly.…

  20. Bioinformatics: towards new directions for public health.

    PubMed

    Maojo, V; Martin-Sanchez, F

    2004-01-01

    Epidemiologists are reformulating their classical approaches to diseases by considering various issues associated to "omics" areas and technologies. Traditional differences between epidemiology and genetics include background, training, terminologies, study designs and others. Public health and epidemiology are increasingly looking forward to using methodologies and informatics tools, facilitated by the Bioinformatics community, for managing genomic information. Our aim is to describe which are the most important implications related with the increasing use of genomic information for public health practice, research and education. To review the contribution of bioinformatics to these issues, in terms of providing the methods and tools needed for processing genetic information from pathogens and patients. To analyze the research challenges in biomedical informatics related with the need of integration of clinical, environmental and genetic data and the new scenarios arisen in public health. Review of the literature, Internet resources and material and reports generated by internal and external research projects. New developments are needed to advance in the study of the interactions between environmental agents and genetic factors involved in the development of diseases. The use of biomarkers, biobanks, and integrated genomic/clinical databases poses serious challenges for informaticians in order to extract useful information and knowledge for public health, biomedical research and healthcare. From an informatics perspective, integrated medical/biological ontologies and new semantic-based models for managing information provide new challenges for research in areas such as genetic epidemiology and the "omics" disciplines, among others. In this regard, there are various ethical, privacy, informed consent and social implications, that should be carefully addressed by researchers, practitioners and policy makers.

  1. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  2. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  3. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  4. Bioinformatics education dissemination with an evolutionary problem solving perspective.

    PubMed

    Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J

    2010-11-01

    Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.

  5. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  6. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  7. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  8. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  9. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

  10. Bioinformatics approaches to single-cell analysis in developmental biology.

    PubMed

    Yalcin, Dicle; Hakguder, Zeynep M; Otu, Hasan H

    2016-03-01

    Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology. © The Author 2015. Published by Oxford University Press on behalf of the European

  11. The Revolution in Viral Genomics as Exemplified by the Bioinformatic Analysis of Human Adenoviruses

    PubMed Central

    Torres, Sarah; Chodosh, James; Seto, Donald; Jones, Morris S.

    2010-01-01

    Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV), HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1–393, 394–1410, 1411–2910) within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics. PMID:21994684

  12. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage

  13. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive

  14. 4273π: bioinformatics education on low cost ARM hardware.

    PubMed

    Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

    2013-08-12

    Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

  15. 4273π: Bioinformatics education on low cost ARM hardware

    PubMed Central

    2013-01-01

    Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194

  16. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Golovin, A; Oldfield, T J; Tate, J G; Velankar, S; Barton, G J; Boutselakis, H; Dimitropoulos, D; Fillon, J; Hussain, A; Ionides, J M C; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Pajon, A; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, G J; Tagari, M; Tromm, S; Vranken, W; Henrick, K

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD 'atlas pages' show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www. ebi.ac.uk/msd-srv/docs/roadshow_tutorial/).

  17. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Golovin, A.; Oldfield, T. J.; Tate, J. G.; Velankar, S.; Barton, G. J.; Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Hussain, A.; Ionides, J. M. C.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Pajon, A.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, G. J.; Tagari, M.; Tromm, S.; Vranken, W.; Henrick, K.

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD ‘atlas pages’ show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www.ebi.ac.uk/msd-srv/docs/roadshow_tutorial/). PMID:14681397

  18. Bioinformatics and evolution of vertebrate nociceptin and opioid receptors.

    PubMed

    Stevens, Craig W

    2015-01-01

    G protein-coupled receptors (GPCRs) are ancestrally related membrane proteins on cells that mediate the pharmacological effect of most drugs and neurotransmitters. GPCRs are the largest group of membrane receptor proteins encoded in the human genome. One of the most famous types of GPCRs is the opioid receptors. Opioid family receptors consist of four closely related proteins expressed in all vertebrate brains and spinal cords examined to date. The three classical types of opioid receptors shown unequivocally to mediate analgesia in animal models and in humans are the mu- (MOR), delta- (DOR), and kappa-(KOR) opioid receptor proteins. The fourth and most recent member of the opioid receptor family discovered is the nociceptin or orphanin FQ receptor (ORL). The role of ORL and its ligands in producing analgesia is not as clear, with both analgesic and hyperalgesic effects reported. All four opioid family receptor genes were cloned from expressed mRNA in a number of vertebrate species, and there are enough sequences presently available to carry out bioinformatic analysis. This chapter presents the results of a comparative analysis of vertebrate opioid receptors using pharmacological studies, bioinformatics, and the latest data from human whole-genome studies. Results confirm our initial hypotheses that the four opioid receptor genes most likely arose by whole-genome duplication, that there is an evolutionary vector of opioid receptor type divergence in sequence and function, and that the hMOR gene shows evidence of positive selection or adaptive evolution in Homo sapiens. © 2015 Elsevier Inc. All rights reserved.

  19. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    SciTech Connect

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan; Meng, Xiandong

    2016-10-06

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics data analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.

  20. Fisher: a program for the detection of H/ACA snoRNAs using MFE secondary structure prediction and comparative genomics – assessment and update

    PubMed Central

    Freyhult, Eva; Edvardsson, Sverker; Tamas, Ivica; Moulton, Vincent; Poole, Anthony M

    2008-01-01

    Background The H/ACA family of small nucleolar RNAs (snoRNAs) plays a central role in guiding the pseudouridylation of ribosomal RNA (rRNA). In an effort to systematically identify the complete set of rRNA-modifying H/ACA snoRNAs from the genome sequence of the budding yeast, Saccharomyces cerevisiae, we developed a program – Fisher – and previously presented several candidate snoRNAs based on our analysis [1]. Findings In this report, we provide a brief update of this work, which was aborted after the publication of experimentally-identified snoRNAs [2] identical to candidates we had identified bioinformatically using Fisher. Our motivation for revisiting this work is to report on the status of the candidate snoRNAs described in [1], and secondly, to report that a modified version of Fisher together with the available multiple yeast genome sequences was able to correctly identify several H/ACA snoRNAs for modification sites not identified by the snoGPS program [3]. While we are no longer developing Fisher, we briefly consider the merits of the Fisher algorithm relative to snoGPS, which may be of use for workers considering pursuing a similar search strategy for the identification of small RNAs. The modified source code for Fisher is made available as supplementary material. Conclusion Our results confirm the validity of using minimum free energy (MFE) secondary structure prediction to guide comparative genomic screening for RNA families with few sequence constraints. PMID:18710502

  1. Toward a Model of Knowledge Structure and a Comparative Analysis of Knowledge Structure Measurement Techniques

    DTIC Science & Technology

    1991-09-01

    POLYCON, INDSCAL/SINDSCAL, KYST, and MULTISCALE). Polzella and Reid (1989) employed MDS techniques to discover differences in performance characteristics...Reasoning and the structure of knowledge in biochemistry. Instructional Science, 17, 57-76. Polzella , D. J., and Reid, G. B. (1989

  2. Data and methods comparing social structure and vegetation structure of urban neighborhoods in Baltimore, Maryland

    Treesearch

    J. Morgan Grove; Mary L. Cadenasso; William R., Jr. Burch; Steward T. Pickett; Kirsten Schwarz; Jarlath O' Neil-Dunne; Matthew Wilson; Austin Troy; Christopher Boone

    2006-01-01

    Recent advances in remote sensing and the adoption of geographic information systems (GIS) have greatly increased the availibility of high-resolution spatial and attribute data for examing the relationship between social and vegetation structure in urban areas. There are several motivations for understanding this relationship. First, the United States has experienced a...

  3. Structure-based classification of FAD binding sites: A comparative study of structural alignment tools.

    PubMed

    Garma, Leonardo D; Medina, Milagros; Juffer, André H

    2016-11-01

    A total of six different structural alignment tools (TM-Align, TriangleMatch, CLICK, ProBis, SiteEngine and GA-SI) were assessed for their ability to perform two particular tasks: (i) discriminating FAD (flavin adenine dinucleotide) from non-FAD binding sites, and (ii) performing an all-to-all comparison on a set of 883 FAD binding sites for the purpose of classifying them. For the first task, the consistency of each alignment method was evaluated, showing that every method is able to distinguish FAD and non-FAD binding sites with a high Matthews correlation coefficient. Additionally, GA-SI was found to provide alignments different from those of the other approaches. The results obtained for the second task revealed more significant differences among alignment methods, as reflected in the poor correlation of their results and highlighted clearly by the independent evaluation of the structural superimpositions generated by each method. The classification itself was performed using the combined results of all methods, using the best result found for each comparison of binding sites. A number of different clustering methods (Single-linkage, UPGMA, Complete-linkage, SPICKER and k-Means clustering) were also used. The groups of similar binding sites (proteins) or clusters generated by the best performing method were further analyzed in terms of local sequence identity, local structural similarity and conservation of analogous contacts with the FAD ligands. Each of the clusters was characterized by a unique set of structural features or patterns, demonstrating that the groups generated truly reflect the structural diversity of FAD binding sites. Proteins 2016; 84:1728-1747. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  5. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  6. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  7. Thriving in Multidisciplinary Research: Advice for New Bioinformatics Students

    PubMed Central

    Auerbach, Raymond K.

    2012-01-01

    The sciences have seen a large increase in demand for students in bioinformatics and multidisciplinary fields in general. Many new educational programs have been created to satisfy this demand, but navigating these programs requires a non-traditional outlook and emphasizes working in teams of individuals with distinct yet complementary skill sets. Written from the perspective of a current bioinformatics student, this article seeks to offer advice to prospective and current students in bioinformatics regarding what to expect in their educational program, how multidisciplinary fields differ from more traditional paths, and decisions that they will face on the road to becoming successful, productive bioinformaticists. PMID:23012580

  8. Thriving in multidisciplinary research: advice for new bioinformatics students.

    PubMed

    Auerbach, Raymond K

    2012-09-01

    The sciences have seen a large increase in demand for students in bioinformatics and multidisciplinary fields in general. Many new educational programs have been created to satisfy this demand, but navigating these programs requires a non-traditional outlook and emphasizes working in teams of individuals with distinct yet complementary skill sets. Written from the perspective of a current bioinformatics student, this article seeks to offer advice to prospective and current students in bioinformatics regarding what to expect in their educational program, how multidisciplinary fields differ from more traditional paths, and decisions that they will face on the road to becoming successful, productive bioinformaticists.

  9. Survey of MapReduce frame operation in bioinformatics.

    PubMed

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

  10. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions.

    PubMed

    Choudhary, Krishna; Deng, Fei; Aviran, Sharon

    2017-03-01

    Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.

  11. Impacts of bioinformatics to medicinal chemistry.

    PubMed

    Chou, Kuo-Chen

    2015-01-01

    Facing the explosive growth of biological sequence data, such as those of protein/peptide and DNA/RNA, generated in the post-genomic age, many bioinformatical and mathematical approaches as well as physicochemical concepts have been introduced to timely derive useful informations from these biological sequences, in order to stimulate the development of medical science and drug design. Meanwhile, because of the rapid penetrations from these disciplines, medicinal chemistry is currently undergoing an unprecedented revolution. In this minireview, we are to summarize the progresses by focusing on the following six aspects. (1) Use the pseudo amino acid composition or PseAAC to predict various attributes of protein/peptide sequences that are useful for drug development. (2) Use pseudo oligonucleotide composition or PseKNC to do the same for DNA/RNA sequences. (3) Introduce the multi-label approach to study those systems where the constituent elements bear multiple characters and functions. (4) Utilize the graphical rules and "wenxiang" diagrams to analyze complicated biomedical systems. (5) Recent development in identifying the interactions of drugs with its various types of target proteins in cellular networking. (6) Distorted key theory and its application in developing peptide drugs.

  12. [piRNAs: Biology and Bioinformatics].

    PubMed

    Zharikova, A A; Mironov, A A

    2016-01-01

    The discovery of small noncoding RNAs and their roles in a variety of regulatory mechanisms have led many scientists to look at the principles of functioning of the cells on a completely different side. Small RNA molecules play key roles in important processes such as the co- and posttranscriptional regulation of gene expression, epigenetic modification of DNA and histones and antiviral protection. piRNA is one of the most numerous, although the least-studied class of small noncoding RNAs. piRNA is highly expressed in the germ line of most eukaryotes and its main function is to regulate the activity of mobile elements during embryonic development. Moreover, recent studies reveal moderate activity of piRNA in somatic cells. However, the mechanisms of piRNA biogenesis and function are still poorly understood and are the object of intensive researches. This review presents actual information about the biogenesis and various functions of piRNA, as well as bioinformatical aspects of this field of molecular biology.

  13. Bioinformatic tools for microRNA dissection

    PubMed Central

    Akhtar, Most Mauluda; Micolucci, Luigina; Islam, Md Soriful; Olivieri, Fabiola; Procopio, Antonio Domenico

    2016-01-01

    Recently, microRNAs (miRNAs) have emerged as important elements of gene regulatory networks. MiRNAs are endogenous single-stranded non-coding RNAs (∼22-nt long) that regulate gene expression at the post-transcriptional level. Through pairing with mRNA, miRNAs can down-regulate gene expression by inhibiting translation or stimulating mRNA degradation. In some cases they can also up-regulate the expression of a target gene. MiRNAs influence a variety of cellular pathways that range from development to carcinogenesis. The involvement of miRNAs in several human diseases, particularly cancer, makes them potential diagnostic and prognostic biomarkers. Recent technological advances, especially high-throughput sequencing, have led to an exponential growth in the generation of miRNA-related data. A number of bioinformatic tools and databases have been devised to manage this growing body of data. We analyze 129 miRNA tools that are being used in diverse areas of miRNA research, to assist investigators in choosing the most appropriate tools for their needs. PMID:26578605

  14. Evolution of web services in bioinformatics.

    PubMed

    Neerincx, Pieter B T; Leunissen, Jack A M

    2005-06-01

    Bioinformaticians have developed large collections of tools to make sense of the rapidly growing pool of molecular biological data. Biological systems tend to be complex and in order to understand them, it is often necessary to link many data sets and use more than one tool. Therefore, bioinformaticians have experimented with several strategies to try to integrate data sets and tools. Owing to the lack of standards for data sets and the interfaces of the tools this is not a trivial task. Over the past few years building services with web-based interfaces has become a popular way of sharing the data and tools that have resulted from many bioinformatics projects. This paper discusses the interoperability problem and how web services are being used to try to solve it, resulting in the evolution of tools with web interfaces from HTML/web form-based tools not suited for automatic workflow generation to a dynamic network of XML-based web services that can easily be used to create pipelines.

  15. Systems Biology: The Next Frontier for Bioinformatics

    PubMed Central

    Likić, Vladimir A.; McConville, Malcolm J.; Lithgow, Trevor; Bacic, Antony

    2010-01-01

    Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from ‘omics platform technologies, in particular “downstream” technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful were strategies involving (a) quantitative measurements of cellular components at the mRNA, protein and metabolite levels, as well as in vivo metabolic reaction rates, (b) development of mathematical models that integrate biochemical knowledge with the information generated by high-throughput experiments, and (c) applications to microbial organisms. The inevitable role bioinformatics plays in modern systems biology puts mathematical and computational sciences as an equal partner to analytical and experimental biology. Furthermore, mathematical and computational models are expected to become increasingly prevalent representations of our knowledge about specific biochemical systems. PMID:21331364

  16. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures

    PubMed Central

    Marti-Renom, Marc A; Rossi, Andrea; Al-Shahrour, Fátima; Davis, Fred P; Pieper, Ursula; Dopazo, Joaquín; Sali, Andrej

    2007-01-01

    Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at . PMID:17570147

  17. Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance

    ERIC Educational Resources Information Center

    Goldstein, Harvey; Bonnet, Gerard; Rocher, Thierry

    2007-01-01

    The Programme for International Student Assessment comparative study of reading performance among 15-year-olds is reanalyzed using statistical procedures that allow the full complexity of the data structures to be explored. The article extends existing multilevel factor analysis and structural equation models and shows how this can extract richer…

  18. A Comparative Structural Equation Modeling Investigation of the Relationships among Teaching, Cognitive and Social Presence

    ERIC Educational Resources Information Center

    Kozan, Kadir

    2016-01-01

    The present study investigated the relationships among teaching, cognitive, and social presence through several structural equation models to see which model would better fit the data. To this end, the present study employed and compared several different structural equation models because different models could fit the data equally well. Among…

  19. Monte Carlo modelling of photodynamic therapy treatments comparing clustered three dimensional tumour structures with homogeneous tissue structures

    NASA Astrophysics Data System (ADS)

    Campbell, C. L.; Wood, K.; Brown, C. T. A.; Moseley, H.

    2016-07-01

    We explore the effects of three dimensional (3D) tumour structures on depth dependent fluence rates, photodynamic doses (PDD) and fluorescence images through Monte Carlo radiation transfer modelling of photodynamic therapy. The aim with this work was to compare the commonly used uniform tumour densities with non-uniform densities to determine the importance of including 3D models in theoretical investigations. It was found that fractal 3D models resulted in deeper penetration on average of therapeutic radiation and higher PDD. An increase in effective treatment depth of 1 mm was observed for one of the investigated fractal structures, when comparing to the equivalent smooth model. Wide field fluorescence images were simulated, revealing information about the relationship between tumour structure and the appearance of the fluorescence intensity. Our models indicate that the 3D tumour structure strongly affects the spatial distribution of therapeutic light, the PDD and the wide field appearance of surface fluorescence images.

  20. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  1. Creating Bioinformatic Workflows within the BioExtract Server

    USDA-ARS?s Scientific Manuscript database

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows generally require access to multiple, distributed data sources and analytic tools. The requisite data sources may include large public data repositories, community...

  2. Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

    PubMed Central

    Diaz Acosta, B.

    2011-01-01

    The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.

  3. Development of a cloud-based Bioinformatics Training Platform

    PubMed Central

    Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette

    2017-01-01

    Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333

  4. The potential of translational bioinformatics approaches for pharmacology research.

    PubMed

    Li, Lang

    2015-10-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines. © 2015 The British Pharmacological Society.

  5. The potential of translational bioinformatics approaches for pharmacology research

    PubMed Central

    Li, Lang

    2015-01-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of ‘omics’ to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines. PMID:25753093

  6. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond

    PubMed Central

    Hiraoka, Satoshi; Yang, Ching-chia; Iwasaki, Wataru

    2016-01-01

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives. PMID:27383682

  7. Bioinformatics opportunities for identification and study of medicinal plants

    PubMed Central

    Sharma, Vivekanand

    2013-01-01

    Plants have been used as a source of medicine since historic times and several commercially important drugs are of plant-based origin. The traditional approach towards discovery of plant-based drugs often times involves significant amount of time and expenditure. These labor-intensive approaches have struggled to keep pace with the rapid development of high-throughput technologies. In the era of high volume, high-throughput data generation across the biosciences, bioinformatics plays a crucial role. This has generally been the case in the context of drug designing and discovery. However, there has been limited attention to date to the potential application of bioinformatics approaches that can leverage plant-based knowledge. Here, we review bioinformatics studies that have contributed to medicinal plants research. In particular, we highlight areas in medicinal plant research where the application of bioinformatics methodologies may result in quicker and potentially cost-effective leads toward finding plant-based remedies. PMID:22589384

  8. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

    PubMed Central

    2011-01-01

    Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353

  9. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.

    PubMed

    Jung, Sang-Kyu; McDonald, Karen

    2011-08-16

    Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.

  10. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  11. Proceedings: the Applications of Bioinformatics in Cancer Detection Workshop.

    PubMed

    Kapetanovic, Izet M; Umar, Asad; Khan, Javed

    2004-05-01

    The Division of Cancer Prevention of the National Cancer Institute sponsored and organized the Applications of Bioinformatics in Cancer Detection Workshop on August 6-7, 2002. The goal of the workshop was to evaluate the state of the science of bioinformatics and determine how it may be used to assist early cancer detection, risk identification, risk assessment, and risk reduction. This paper summarizes the proceedings of this conference and points out future directions for research.

  12. A web services choreography scenario for interoperating bioinformatics applications

    PubMed Central

    de Knikker, Remko; Guo, Youjun; Li, Jin-long; Kwan, Albert KH; Yip, Kevin Y; Cheung, David W; Cheung, Kei-Hoi

    2004-01-01

    Background Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. Results To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD) that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH) category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replicates the HAPI functionality. These web services use document-style messages, which means that messages are encoded in an XML-based format. We compared three approaches to the implementation of an XML-based workflow: a hard coded Java application, Collaxa BPEL Server and Taverna Workbench. The Java program functions as a web services engine and interoperates with these web

  13. BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.

    PubMed

    Velloso, Henrique; Vialle, Ricardo A; Ortega, J Miguel

    2015-06-02

    Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.

  14. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  15. PineappleDB: an online pineapple bioinformatics resource.

    PubMed

    Moyle, Richard L; Crowe, Mark L; Ripi-Koia, Jonni; Fairbairn, David J; Botella, José R

    2005-10-05

    A world first pineapple EST sequencing program has been undertaken to investigate genes expressed during non-climacteric fruit ripening and the nematode-plant interaction during root infection. Very little is known of how non-climacteric fruit ripening is controlled or of the molecular basis of the nematode-plant interaction. PineappleDB was developed to provide the research community with access to a curated bioinformatics resource housing the fruit, root and nematode infected gall expressed sequences. PineappleDB is an online, curated database providing integrated access to annotated expressed sequence tag (EST) data for cDNA clones isolated from pineapple fruit, root, and nematode infected root gall vascular cylinder tissues. The database currently houses over 5600 EST sequences, 3383 contig consensus sequences, and associated bioinformatic data including splice variants, Arabidopsis homologues, both MIPS based and Gene Ontology functional classifications, and clone distributions. The online resource can be searched by text or by BLAST sequence homology. The data outputs provide comprehensive sequence, bioinformatic and functional classification information. The online pineapple bioinformatic resource provides the research community with access to pineapple fruit and root/gall sequence and bioinformatic data in a user-friendly format. The search tools enable efficient data mining and present a wide spectrum of bioinformatic and functional classification information. PineappleDB will be of broad appeal to researchers investigating pineapple genetics, non-climacteric fruit ripening, root-knot nematode infection, crassulacean acid metabolism and alternative RNA splicing in plants.

  16. PineappleDB: An online pineapple bioinformatics resource

    PubMed Central

    Moyle, Richard L; Crowe, Mark L; Ripi-Koia, Jonni; Fairbairn, David J; Botella, José R

    2005-01-01

    Background A world first pineapple EST sequencing program has been undertaken to investigate genes expressed during non-climacteric fruit ripening and the nematode-plant interaction during root infection. Very little is known of how non-climacteric fruit ripening is controlled or of the molecular basis of the nematode-plant interaction. PineappleDB was developed to provide the research community with access to a curated bioinformatics resource housing the fruit, root and nematode infected gall expressed sequences. Description PineappleDB is an online, curated database providing integrated access to annotated expressed sequence tag (EST) data for cDNA clones isolated from pineapple fruit, root, and nematode infected root gall vascular cylinder tissues. The database currently houses over 5600 EST sequences, 3383 contig consensus sequences, and associated bioinformatic data including splice variants, Arabidopsis homologues, both MIPS based and Gene Ontology functional classifications, and clone distributions. The online resource can be searched by text or by BLAST sequence homology. The data outputs provide comprehensive sequence, bioinformatic and functional classification information. Conclusion The online pineapple bioinformatic resource provides the research community with access to pineapple fruit and root/gall sequence and bioinformatic data in a user-friendly format. The search tools enable efficient data mining and present a wide spectrum of bioinformatic and functional classification information. PineappleDB will be of broad appeal to researchers investigating pineapple genetics, non-climacteric fruit ripening, root-knot nematode infection, crassulacean acid metabolism and alternative RNA splicing in plants. PMID:16202174

  17. Bioinformatic prediction of the epitopes of Echinococcus granulosus antigen 5

    PubMed Central

    Pan, Wei; Chen, De-Sheng; Lu, Yun-Juan; Sun, Fen-Fen; Xu, Hui-Wen; Zhang, Ya-Wen; Yan, Chao; Fu, Lin-Lin; Zheng, Kui-Yang; Tang, Ren-Xian

    2017-01-01

    The aim of the present study was to predict and analyze the secondary structure, and B and T cell epitopes of Echinococcus granulosus antigen 5 (Ag5) using online software in order to investigate its immunogenicity and preliminarily evaluate its potential as an effective antigen peptide vaccine for cystic echinococcosis. The PortParam program was used to analyze molecular weight, the theoretical isoelectric point, instability index and other physicochemical properties. The secondary structure of the Ag5 protein was predicted using Self-Optimized Prediction method With Alignment and the tertiary structure of the Ag5 protein was predicted using 3DLigandSite together with Center for Biological Sequence Analysis Prediction Servers. Furthermore, the Immune Epitope Database software was used to predict B cell epitopes, and T cell epitopes were predicted with the BioInformatics and Molecular Analysis Section and SYFPEITHI programs. The results demonstrated that α-helixes, β-turns, random coils and extended strands account for 23.35, 10.95, 41.32, and 24.38% of the secondary structure of the Ag5 protein, respectively. Ten potential B cell epitopes of Ag5 were identified as the amino acids sequences 27–39, 70–80, 117–130, 146–168, 250–262, 284–293, 339–349, 359–371, 403–412 and 454–462, and seven potential T cell epitopes were identified as the amino acid sequences 52–60, 57–65, 182–190, 231–239, 273–281, 318–326 and 467–475. Thus, ten B cell epitopes and seven T cell epitopes were identified on Ag5, suggesting the strong immunogenicity of this protein, which could be applied to design antigen peptide vaccines for echinococcosis.

  18. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data

    PubMed Central

    Roumpeka, Despoina D.; Wallace, R. John; Escalettes, Frank; Fotheringham, Ian; Watson, Mick

    2017-01-01

    The microbiome can be defined as the community of microorganisms that live in a particular environment. Metagenomics is the practice of sequencing DNA from the genomes of all organisms present in a particular sample, and has become a common method for the study of microbiome population structure and function. Increasingly, researchers are finding novel genes encoded within metagenomes, many of which may be of interest to the biotechnology and pharmaceutical industries. However, such “bioprospecting” requires a suite of sophisticated bioinformatics tools to make sense of the data. This review summarizes the most commonly used bioinformatics tools for the assembly and annotation of metagenomic sequence data with the aim of discovering novel genes. PMID:28321234

  19. Bioinformatics tools for genome mining of polyketide and non-ribosomal peptides.

    PubMed

    Boddy, Christopher N

    2014-02-01

    Microbial natural products have played a key role in the development of clinical agents in nearly all therapeutic areas. Recent advances in genome sequencing have revealed that there is an incredible wealth of new polyketide and non-ribosomal peptide natural product diversity to be mined from genetic data. The diversity and complexity of polyketide and non-ribosomal peptide biosynthesis has required the development of unique bioinformatics tools to identify, annotate, and predict the structures of these natural products from their biosynthetic gene clusters. This review highlights and evaluates web-based bioinformatics tools currently available to the natural product community for genome mining to discover new polyketides and non-ribosomal peptides.

  20. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980