Sample records for effective bioinformatics programming

  1. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  2. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  3. "Extreme Programming" in a Bioinformatics Class

    ERIC Educational Resources Information Center

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP).…

  4. Integer Linear Programming in Computational Biology

    NASA Astrophysics Data System (ADS)

    Althaus, Ernst; Klau, Gunnar W.; Kohlbacher, Oliver; Lenhof, Hans-Peter; Reinert, Knut

    Computational molecular biology (bioinformatics) is a young research field that is rich in NP-hard optimization problems. The problem instances encountered are often huge and comprise thousands of variables. Since their introduction into the field of bioinformatics in 1997, integer linear programming (ILP) techniques have been successfully applied to many optimization problems. These approaches have added much momentum to development and progress in related areas. In particular, ILP-based approaches have become a standard optimization technique in bioinformatics. In this review, we present applications of ILP-based techniques developed by members and former members of Kurt Mehlhorn’s group. These techniques were introduced to bioinformatics in a series of papers and popularized by demonstration of their effectiveness and potential.

  5. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  6. Expanding roles in a library-based bioinformatics service program: a case study

    PubMed Central

    Li, Meng; Chen, Yi-Bu; Clintworth, William A

    2013-01-01

    Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting: A program at a health sciences library serving a large academic medical center with a strong research focus is described. Methods: The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. Results: Personnel and software have increased along with the number of registered software users and use of the provided services. Conclusion: With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs. PMID:24163602

  7. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  8. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

    PubMed

    Taylor, Ronald C

    2010-12-21

    Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.

  9. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  10. Bioinformatics education in high school: implications for promoting science, technology, engineering, and mathematics careers.

    PubMed

    Kovarik, Dina N; Patterson, Davis G; Cohen, Carolyn; Sanders, Elizabeth A; Peterson, Karen A; Porter, Sandra G; Chowning, Jeanne Ting

    2013-01-01

    We investigated the effects of our Bio-ITEST teacher professional development model and bioinformatics curricula on cognitive traits (awareness, engagement, self-efficacy, and relevance) in high school teachers and students that are known to accompany a developing interest in science, technology, engineering, and mathematics (STEM) careers. The program included best practices in adult education and diverse resources to empower teachers to integrate STEM career information into their classrooms. The introductory unit, Using Bioinformatics: Genetic Testing, uses bioinformatics to teach basic concepts in genetics and molecular biology, and the advanced unit, Using Bioinformatics: Genetic Research, utilizes bioinformatics to study evolution and support student research with DNA barcoding. Pre-post surveys demonstrated significant growth (n = 24) among teachers in their preparation to teach the curricula and infuse career awareness into their classes, and these gains were sustained through the end of the academic year. Introductory unit students (n = 289) showed significant gains in awareness, relevance, and self-efficacy. While these students did not show significant gains in engagement, advanced unit students (n = 41) showed gains in all four cognitive areas. Lessons learned during Bio-ITEST are explored in the context of recommendations for other programs that wish to increase student interest in STEM careers.

  11. Bioinformatics Education in High School: Implications for Promoting Science, Technology, Engineering, and Mathematics Careers

    PubMed Central

    Kovarik, Dina N.; Patterson, Davis G.; Cohen, Carolyn; Sanders, Elizabeth A.; Peterson, Karen A.; Porter, Sandra G.; Chowning, Jeanne Ting

    2013-01-01

    We investigated the effects of our Bio-ITEST teacher professional development model and bioinformatics curricula on cognitive traits (awareness, engagement, self-efficacy, and relevance) in high school teachers and students that are known to accompany a developing interest in science, technology, engineering, and mathematics (STEM) careers. The program included best practices in adult education and diverse resources to empower teachers to integrate STEM career information into their classrooms. The introductory unit, Using Bioinformatics: Genetic Testing, uses bioinformatics to teach basic concepts in genetics and molecular biology, and the advanced unit, Using Bioinformatics: Genetic Research, utilizes bioinformatics to study evolution and support student research with DNA barcoding. Pre–post surveys demonstrated significant growth (n = 24) among teachers in their preparation to teach the curricula and infuse career awareness into their classes, and these gains were sustained through the end of the academic year. Introductory unit students (n = 289) showed significant gains in awareness, relevance, and self-efficacy. While these students did not show significant gains in engagement, advanced unit students (n = 41) showed gains in all four cognitive areas. Lessons learned during Bio-ITEST are explored in the context of recommendations for other programs that wish to increase student interest in STEM careers. PMID:24006393

  12. A Summer Program Designed to Educate College Students for Careers in Bioinformatics

    ERIC Educational Resources Information Center

    Krilowicz, Beverly; Johnston, Wendie; Sharp, Sandra B.; Warter-Perez, Nancy; Momand, Jamil

    2007-01-01

    A summer program was created for undergraduates and graduate students that teaches bioinformatics concepts, offers skills in professional development, and provides research opportunities in academic and industrial institutions. We estimate that 34 of 38 graduates (89%) are in a career trajectory that will use bioinformatics. Evidence from…

  13. Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics

    ERIC Educational Resources Information Center

    Likic, Vladimir A.

    2006-01-01

    This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…

  14. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  15. Bioinformatics in Middle East Program Curricula--A Focus on the Arabian Gulf

    ERIC Educational Resources Information Center

    Loucif, Samia

    2014-01-01

    The purpose of this paper is to investigate the inclusion of bioinformatics in program curricula in the Middle East, focusing on educational institutions in the Arabian Gulf. Bioinformatics is a multidisciplinary field which has emerged in response to the need for efficient data storage and retrieval, and accurate and fast computational and…

  16. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers

    PubMed Central

    Brazas, Michelle D.; Ouellette, B. F. Francis

    2016-01-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression. PMID:27281025

  17. Continuing Education Workshops in Bioinformatics Positively Impact Research and Careers.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2016-06-01

    Bioinformatics.ca has been hosting continuing education programs in introductory and advanced bioinformatics topics in Canada since 1999 and has trained more than 2,000 participants to date. These workshops have been adapted over the years to keep pace with advances in both science and technology as well as the changing landscape in available learning modalities and the bioinformatics training needs of our audience. Post-workshop surveys have been a mandatory component of each workshop and are used to ensure appropriate adjustments are made to workshops to maximize learning. However, neither bioinformatics.ca nor others offering similar training programs have explored the long-term impact of bioinformatics continuing education training. Bioinformatics.ca recently initiated a look back on the impact its workshops have had on the career trajectories, research outcomes, publications, and collaborations of its participants. Using an anonymous online survey, bioinformatics.ca analyzed responses from those surveyed and discovered its workshops have had a positive impact on collaborations, research, publications, and career progression.

  18. Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis

    PubMed Central

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-01-01

    Motivation: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. Results: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of ‘real-world’ bioinformatics tools. Availability: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics. Contact: robert@techfak.uni-bielefeld.de Supplementary information: Supplementary data are available at Bioinformatics online PMID:23355290

  19. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades.

    PubMed

    Wren, Jonathan D

    2016-09-01

    To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Bioinformatics for Exploration

    NASA Technical Reports Server (NTRS)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  1. Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

    PubMed

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-03-01

    Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.

  2. Vignettes: diverse library staff offering diverse bioinformatics services*

    PubMed Central

    Osterbur, David L.; Alpi, Kristine; Canevari, Catharine; Corley, Pamela M.; Devare, Medha; Gaedeke, Nicola; Jacobs, Donna K.; Kirlew, Peter; Ohles, Janet A.; Vaughan, K.T.L.; Wang, Lili; Wu, Yongchun; Geer, Renata C.

    2006-01-01

    Objectives: The paper gives examples of the bioinformatics services provided in a variety of different libraries by librarians with a broad range of educational background and training. Methods: Two investigators sent an email inquiry to attendees of the “National Center for Biotechnology Information's (NCBI) Introduction to Molecular Biology Information Resources” or “NCBI Advanced Workshop for Bioinformatics Information Specialists (NAWBIS)” courses. The thirty-five-item questionnaire addressed areas such as educational background, library setting, types and numbers of users served, and bioinformatics training and support services provided. Answers were compiled into program vignettes. Discussion: The bioinformatics support services addressed in the paper are based in libraries with academic and clinical settings. Services have been established through different means: in collaboration with biology faculty as part of formal courses, through teaching workshops in the library, through one-on-one consultations, and by other methods. Librarians with backgrounds from art history to doctoral degrees in genetics have worked to establish these programs. Conclusion: Successful bioinformatics support programs can be established in libraries in a variety of different settings and by staff with a variety of different backgrounds and approaches. PMID:16888664

  3. Carving a niche: establishing bioinformatics collaborations

    PubMed Central

    Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.

    2006-01-01

    Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668

  4. BioSmalltalk: a pure object system and library for bioinformatics.

    PubMed

    Morales, Hernán F; Giovambattista, Guillermo

    2013-09-15

    We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important for discovering biological knowledge, as is demonstrated by the emergence of open-source programming toolkits for bioinformatics in the past years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving the possibility of scaling to whole-system biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk hernan.morales@gmail.com Supplementary data are available at Bioinformatics online.

  5. Design and Implementation of an Interdepartmental Bioinformatics Program across Life Science Curricula

    ERIC Educational Resources Information Center

    Miskowski, Jennifer A.; Howard, David R.; Abler, Michael L.; Grunwald, Sandra K.

    2007-01-01

    Over the past 10 years, there has been a technical revolution in the life sciences leading to the emergence of a new discipline called bioinformatics. In response, bioinformatics-related topics have been incorporated into various undergraduate courses along with the development of new courses solely focused on bioinformatics. This report describes…

  6. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program

    PubMed Central

    Rein, Diane C.

    2006-01-01

    Setting: Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. Program Components: After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. Evaluation Mechanisms: The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Next Steps/Future Directions: Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making. PMID:16888666

  7. Exploring DNA Structure with Cn3D

    ERIC Educational Resources Information Center

    Porter, Sandra G.; Day, Joseph; McCarty, Richard E.; Shearn, Allen; Shingles, Richard; Fletcher, Linnea; Murphy, Stephanie; Pearlman, Rebecca

    2007-01-01

    Researchers in the field of bioinformatics have developed a number of analytical programs and databases that are increasingly important for advancing biological research. Because bioinformatics programs are used to analyze, visualize, and/or compare biological data, it is likely that the use of these programs will have a positive impact on biology…

  8. Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics.

    PubMed

    Bonnal, Raoul J P; Aerts, Jan; Githinji, George; Goto, Naohisa; MacLean, Dan; Miller, Chase A; Mishima, Hiroyuki; Pagani, Massimiliano; Ramirez-Gonzalez, Ricardo; Smant, Geert; Strozzi, Francesco; Syme, Rob; Vos, Rutger; Wennblom, Trevor J; Woodcroft, Ben J; Katayama, Toshiaki; Prins, Pjotr

    2012-04-01

    Biogem provides a software development environment for the Ruby programming language, which encourages community-based software development for bioinformatics while lowering the barrier to entry and encouraging best practices. Biogem, with its targeted modular and decentralized approach, software generator, tools and tight web integration, is an improved general model for scaling up collaborative open source software development in bioinformatics. Biogem and modules are free and are OSS. Biogem runs on all systems that support recent versions of Ruby, including Linux, Mac OS X and Windows. Further information at http://www.biogems.info. A tutorial is available at http://www.biogems.info/howto.html bonnal@ingm.org.

  9. A comparison of common programming languages used in bioinformatics.

    PubMed

    Fourment, Mathieu; Gillings, Michael R

    2008-02-05

    The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

  10. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  11. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  12. Development of an undergraduate bioinformatics degree program at a liberal arts college.

    PubMed

    Bagga, Paramjeet S

    2012-09-01

    The highly interdisciplinary field of bioinformatics has emerged as a powerful modern science. There has been a great demand for undergraduate- and graduate-level trained bioinformaticists in the industry as well in the academia. In order to address the needs for trained bioinformaticists, its curriculum must be offered at the undergraduate level, especially at four-year colleges, where a majority of the United States gets its education. There are many challenges in developing an undergraduate-level bioinformatics program that needs to be carefully designed as a well-integrated and cohesive interdisciplinary curriculum that prepares the students for a wide variety of career options. This article describes the challenges of establishing a highly interdisciplinary undergraduate major, the development of an undergraduate bioinformatics degree program at Ramapo College of New Jersey, and lessons learned in the last 10 years during its management.

  13. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  14. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  15. Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

    PubMed

    Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory

    2016-06-13

    Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

  16. [Integration of clinical and biological data in clinical practice using bioinformatics].

    PubMed

    Coltell, Oscar; Arregui, María; Fabregat, Antonio; Portolés, Olga

    2008-05-01

    The aim of our work is to describe essential aspects of Medical Informatics, Bioinformatics and Biomedical Informatics, that are used in biomedical research and clinical practice. These disciplines have emerged from the need to find new scientific and technical approaches to manage, store, analyze and report data generated in clinical practice and molecular biology and other medical specialties. It can be also useful to integrate research information generated in different areas of health care. Moreover, these disciplines are interdisciplinary and integrative, two key features not shared by other areas of medical knowledge. Finally, when Bioinformatics and Biomedical Informatics approach to medical investigation and practice are applied, a new discipline, called Clinical Bioinformatics, emerges. The latter requires a specific training program to create a new professional profile. We have not been able to find a specific training program in Clinical Bioinformatics in Spain.

  17. Implementing a Web-Based Introductory Bioinformatics Course for Non-Bioinformaticians That Incorporates Practical Exercises

    ERIC Educational Resources Information Center

    Vincent, Antony T.; Bourbonnais, Yves; Brouard, Jean-Simon; Deveau, Hélène; Droit, Arnaud; Gagné, Stéphane M.; Guertin, Michel; Lemieux, Claude; Rathier, Louis; Charette, Steve J.; Lagüe, Patrick

    2018-01-01

    A recent scientific discipline, bioinformatics, defined as using informatics for the study of biological problems, is now a requirement for the study of biological sciences. Bioinformatics has become such a powerful and popular discipline that several academic institutions have created programs in this field, allowing students to become…

  18. Implementation and Assessment of a Molecular Biology and Bioinformatics Undergraduate Degree Program

    ERIC Educational Resources Information Center

    Pham, Daphne Q. -D.; Higgs, David C.; Statham, Anne; Schleiter, Mary Kay

    2008-01-01

    The Department of Biological Sciences at the University of Wisconsin-Parkside has developed and implemented an innovative, multidisciplinary undergraduate curriculum in Molecular Biology and Bioinformatics (MBB). The objective of the MBB program is to give students a hands-on facility with molecular biology theories and laboratory techniques, an…

  19. BIRCH: a user-oriented, locally-customizable, bioinformatics system.

    PubMed

    Fristensky, Brian

    2007-02-09

    Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.

  20. BIRCH: A user-oriented, locally-customizable, bioinformatics system

    PubMed Central

    Fristensky, Brian

    2007-01-01

    Background Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere. PMID:17291351

  1. Scalable computing for evolutionary genomics.

    PubMed

    Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

    2012-01-01

    Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

  2. An interdepartmental Ph.D. program in computational biology and bioinformatics: the Yale perspective.

    PubMed

    Gerstein, Mark; Greenbaum, Dov; Cheung, Kei; Miller, Perry L

    2007-02-01

    Computational biology and bioinformatics (CBB), the terms often used interchangeably, represent a rapidly evolving biological discipline. With the clear potential for discovery and innovation, and the need to deal with the deluge of biological data, many academic institutions are committing significant resources to develop CBB research and training programs. Yale formally established an interdepartmental Ph.D. program in CBB in May 2003. This paper describes Yale's program, discussing the scope of the field, the program's goals and curriculum, as well as a number of issues that arose in implementing the program. (Further updated information is available from the program's website, www.cbb.yale.edu.)

  3. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less

  4. Broad issues to consider for library involvement in bioinformatics*

    PubMed Central

    Geer, Renata C.

    2006-01-01

    Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662

  5. The GMOD Drupal bioinformatic server framework.

    PubMed

    Papanicolaou, Alexie; Heckel, David G

    2010-12-15

    Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com.

  6. Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

    PubMed

    Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert

    2008-05-01

    Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

  7. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  8. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  9. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  10. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  11. Developing library bioinformatics services in context: the Purdue University Libraries bioinformationist program.

    PubMed

    Rein, Diane C

    2006-07-01

    Purdue University is a major agricultural, engineering, biomedical, and applied life science research institution with an increasing focus on bioinformatics research that spans multiple disciplines and campus academic units. The Purdue University Libraries (PUL) hired a molecular biosciences specialist to discover, engage, and support bioinformatics needs across the campus. After an extended period of information needs assessment and environmental scanning, the specialist developed a week of focused bioinformatics instruction (Bioinformatics Week) to launch system-wide, library-based bioinformatics services. The specialist employed a two-tiered approach to assess user information requirements and expectations. The first phase involved careful observation and collection of information needs in-context throughout the campus, attending laboratory meetings, interviewing department chairs and individual researchers, and engaging in strategic planning efforts. Based on the information gathered during the integration phase, several survey instruments were developed to facilitate more critical user assessment and the recovery of quantifiable data prior to planning. Given information gathered while working with clients and through formal needs assessments, as well as the success of instructional approaches used in Bioinformatics Week, the specialist is developing bioinformatics support services for the Purdue community. The specialist is also engaged in training PUL faculty librarians in bioinformatics to provide a sustaining culture of library-based bioinformatics support and understanding of Purdue's bioinformatics-related decision and policy making.

  12. A comparison of common programming languages used in bioinformatics

    PubMed Central

    Fourment, Mathieu; Gillings, Michael R

    2008-01-01

    Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993

  13. Feasibility and effectiveness of a brief, intensive phylogenetics workshop in a middle-income country.

    PubMed

    Pollett, S; Leguia, M; Nelson, M I; Maljkovic Berry, I; Rutherford, G; Bausch, D G; Kasper, M; Jarman, R; Melendrez, M

    2016-01-01

    There is an increasing role for bioinformatic and phylogenetic analysis in tropical medicine research. However, scientists working in low- and middle-income regions may lack access to training opportunities in these methods. To help address this gap, a 5-day intensive bioinformatics workshop was offered in Lima, Peru. The syllabus is presented here for others who want to develop similar programs. To assess knowledge gained, a 20-point knowledge questionnaire was administered to participants (21 participants) before and after the workshop, covering topics on sequence quality control, alignment/formatting, database retrieval, models of evolution, sequence statistics, tree building, and results interpretation. Evolution/tree-building methods represented the lowest scoring domain at baseline and after the workshop. There was a considerable median gain in total knowledge scores (increase of 30%, p<0.001) with gains as high as 55%. A 5-day workshop model was effective in improving the pathogen-applied bioinformatics knowledge of scientists working in a middle-income country setting. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. A toolbox for developing bioinformatics software

    PubMed Central

    Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M.

    2012-01-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787

  15. The development and application of bioinformatics core competencies to improve bioinformatics training and education.

    PubMed

    Mulder, Nicola; Schwartz, Russell; Brazas, Michelle D; Brooksbank, Cath; Gaeta, Bruno; Morgan, Sarah L; Pauley, Mark A; Rosenwald, Anne; Rustici, Gabriella; Sierk, Michael; Warnow, Tandy; Welch, Lonnie

    2018-02-01

    Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.

  16. The development and application of bioinformatics core competencies to improve bioinformatics training and education

    PubMed Central

    Brooksbank, Cath; Morgan, Sarah L.; Rosenwald, Anne; Warnow, Tandy; Welch, Lonnie

    2018-01-01

    Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans. PMID:29390004

  17. Models@Home: distributed computing in bioinformatics using a screensaver based approach.

    PubMed

    Krieger, Elmar; Vriend, Gert

    2002-02-01

    Due to the steadily growing computational demands in bioinformatics and related scientific disciplines, one is forced to make optimal use of the available resources. A straightforward solution is to build a network of idle computers and let each of them work on a small piece of a scientific challenge, as done by Seti@Home (http://setiathome.berkeley.edu), the world's largest distributed computing project. We developed a generally applicable distributed computing solution that uses a screensaver system similar to Seti@Home. The software exploits the coarse-grained nature of typical bioinformatics projects. Three major considerations for the design were: (1) often, many different programs are needed, while the time is lacking to parallelize them. Models@Home can run any program in parallel without modifications to the source code; (2) in contrast to the Seti project, bioinformatics applications are normally more sensitive to lost jobs. Models@Home therefore includes stringent control over job scheduling; (3) to allow use in heterogeneous environments, Linux and Windows based workstations can be combined with dedicated PCs to build a homogeneous cluster. We present three practical applications of Models@Home, running the modeling programs WHAT IF and YASARA on 30 PCs: force field parameterization, molecular dynamics docking, and database maintenance.

  18. Bioinformatics in translational drug discovery.

    PubMed

    Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G

    2017-08-31

    Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).

  19. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  20. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  1. Adapting bioinformatics curricula for big data.

    PubMed

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. © The Author 2015. Published by Oxford University Press.

  2. Adapting bioinformatics curricula for big data

    PubMed Central

    Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  3. The GMOD Drupal Bioinformatic Server Framework

    PubMed Central

    Papanicolaou, Alexie; Heckel, David G.

    2010-01-01

    Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988

  4. Establishing a master's degree programme in bioinformatics: challenges and opportunities.

    PubMed

    Sahinidis, N V; Harandi, M T; Heath, M T; Murphy, L; Snir, M; Wheeler, R P; Zukoski, C F

    2005-12-01

    The development of the Bioinformatics MS degree program at the University of Illinois, the challenges and opportunities associated with such a process, and the current structure of the program is described. This program has departed from earlier University practice in significant ways. Despite the existence of several interdisciplinary programs at the University, a few of which grant degrees, this is the first interdisciplinary program that grants degrees and formally recognises departmental specialisation areas. The program, which is not owned by any particular department but by the Graduate College itself, is operated in a franchise-like fashion via several departmental concentrations. With four different colleges and many more departments involved in establishing and operating the program, the logistics of the operation are of considerable complexity but result in significant interactions across the entire campus.

  5. Bioinformatics for Undergraduates: Steps toward a Quantitative Bioscience Curriculum

    ERIC Educational Resources Information Center

    Chapman, Barbara S.; Christmann, James L.; Thatcher, Eileen F.

    2006-01-01

    We describe an innovative bioinformatics course developed under grants from the National Science Foundation and the California State University Program in Research and Education in Biotechnology for undergraduate biology students. The project has been part of a continuing effort to offer students classroom experiences focused on principles and…

  6. Bioinformatics projects supporting life-sciences learning in high schools.

    PubMed

    Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B

    2014-01-01

    The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.

  7. Open discovery: An integrated live Linux platform of Bioinformatics tools.

    PubMed

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery - a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in.

  8. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  9. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologiesmore » in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.« less

  10. The growing need for microservices in bioinformatics.

    PubMed

    Williams, Christopher L; Sica, Jeffrey C; Killen, Robert T; Balis, Ulysses G J

    2016-01-01

    Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Bioinformatics relies on nimble IT framework which can adapt to changing requirements. To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics. Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting.

  11. The growing need for microservices in bioinformatics

    PubMed Central

    Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.

    2016-01-01

    Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective methodology for the fabrication and implementation of reliable and innovative software, made possible in a highly collaborative setting. PMID:27994937

  12. Bioinformatic training needs at a health sciences campus.

    PubMed

    Oliver, Jeffrey C

    2017-01-01

    Health sciences research is increasingly focusing on big data applications, such as genomic technologies and precision medicine, to address key issues in human health. These approaches rely on biological data repositories and bioinformatic analyses, both of which are growing rapidly in size and scope. Libraries play a key role in supporting researchers in navigating these and other information resources. With the goal of supporting bioinformatics research in the health sciences, the University of Arizona Health Sciences Library established a Bioinformation program. To shape the support provided by the library, I developed and administered a needs assessment survey to the University of Arizona Health Sciences campus in Tucson, Arizona. The survey was designed to identify the training topics of interest to health sciences researchers and the preferred modes of training. Survey respondents expressed an interest in a broad array of potential training topics, including "traditional" information seeking as well as interest in analytical training. Of particular interest were training in transcriptomic tools and the use of databases linking genotypes and phenotypes. Staff were most interested in bioinformatics training topics, while faculty were the least interested. Hands-on workshops were significantly preferred over any other mode of training. The University of Arizona Health Sciences Library is meeting those needs through internal programming and external partnerships. The results of the survey demonstrate a keen interest in a variety of bioinformatic resources; the challenge to the library is how to address those training needs. The mode of support depends largely on library staff expertise in the numerous subject-specific databases and tools. Librarian-led bioinformatic training sessions provide opportunities for engagement with researchers at multiple points of the research life cycle. When training needs exceed library capacity, partnering with intramural and extramural units will be crucial in library support of health sciences bioinformatic research.

  13. BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services.

    PubMed

    Velloso, Henrique; Vialle, Ricardo A; Ortega, J Miguel

    2015-06-02

    Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.

  14. ORBIT: an integrated environment for user-customized bioinformatics tools.

    PubMed

    Bellgard, M I; Hiew, H L; Hunter, A; Wiebrands, M

    1999-10-01

    There are a large number of computational programs freely available to bioinformaticians via a client/server, web-based environment. However, the client interface to these tools (typically an html form page) cannot be customized from the client side as it is created by the service provider. The form page is usually generic enough to cater for a wide range of users. However, this implies that a user cannot set as 'default' advanced program parameters on the form or even customize the interface to his/her specific requirements or preferences. Currently, there is a lack of end-user interface environments that can be modified by the user when accessing computer programs available on a remote server running on an intranet or over the Internet. We have implemented a client/server system called ORBIT (Online Researcher's Bioinformatics Interface Tools) where individual clients can have interfaces created and customized to command-line-driven, server-side programs. Thus, Internet-based interfaces can be tailored to a user's specific bioinformatic needs. As interfaces are created on the client machine independent of the server, there can be different interfaces to the same server-side program to cater for different parameter settings. The interface customization is relatively quick (between 10 and 60 min) and all client interfaces are integrated into a single modular environment which will run on any computer platform supporting Java. The system has been developed to allow for a number of future enhancements and features. ORBIT represents an important advance in the way researchers gain access to bioinformatics tools on the Internet.

  15. Establishing a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia.

    PubMed

    Schneider, Maria Victoria; Griffin, Philippa C; Tyagi, Sonika; Flannery, Madison; Dayalan, Saravanan; Gladman, Simon; Watson-Haigh, Nathan; Bayer, Philipp E; Charleston, Michael; Cooke, Ira; Cook, Rob; Edwards, Richard J; Edwards, David; Gorse, Dominique; McConville, Malcolm; Powell, David; Wilkins, Marc R; Lonie, Andrew

    2017-06-30

    EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article. © The Author 2017. Published by Oxford University Press.

  16. Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools

    ERIC Educational Resources Information Center

    Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.

    2011-01-01

    Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…

  17. Bioinformatics Projects Supporting Life-Sciences Learning in High Schools

    PubMed Central

    Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B.

    2014-01-01

    The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools. PMID:24465192

  18. Open discovery: An integrated live Linux platform of Bioinformatics tools

    PubMed Central

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery ‐ a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. Availability The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in PMID:19238235

  19. XML schemas for common bioinformatic data types and their application in workflow systems

    PubMed Central

    Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

    2006-01-01

    Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823

  20. Vertical and horizontal integration of bioinformatics education: A modular, interdisciplinary approach.

    PubMed

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D Blaine; Langeland, James A

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option. Furthermore, we believe that a true interdisciplinary science experience would be best served by introduction of bioinformatics modules within existing courses in biology and chemistry and other complementary departments. To that end, with support from the Howard Hughes Medical Institute, we have developed over a dozen independent bioinformatics modules for our students that are incorporated into courses ranging from general chemistry and biology, advanced specialty courses, and classes in complementary disciplines such as computer science, mathematics, and physics. These activities have largely promoted active learning in our classrooms and have enhanced student understanding of course materials. Herein, we describe our program, the activities we have developed, and assessment of our endeavors in this area. Copyright © 2009 International Union of Biochemistry and Molecular Biology, Inc.

  1. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  2. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    ERIC Educational Resources Information Center

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  3. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

  4. Visiting Scholars Program Application | FNLCR Staging

    Cancer.gov

    Below are scientific areas and programs that the Frederick National Labisactively seeking scholars to participate: Data Science and Information Technology (including Bioinformatics, Visualization, etc) Advanced Preclinical Researc

  5. OpenHelix: bioinformatics education outside of a different box.

    PubMed

    Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C

    2010-11-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.

  6. OpenHelix: bioinformatics education outside of a different box

    PubMed Central

    Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.

    2010-01-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  7. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  8. Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

    PubMed Central

    Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769

  9. The 2015 Bioinformatics Open Source Conference (BOSC 2015)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J. A.; Lapp, Hilmar

    2016-01-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included “Data Science;” “Standards and Interoperability;” “Open Science and Reproducibility;” “Translational Bioinformatics;” “Visualization;” and “Bioinformatics Open Source Project Updates”. In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled “Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community,” that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653

  10. Making authentic science accessible-the benefits and challenges of integrating bioinformatics into a high-school science curriculum.

    PubMed

    Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.

  11. bioalcidae, samjs and vcffilterjs: object-oriented formatters and filters for bioinformatics files.

    PubMed

    Lindenbaum, Pierre; Redon, Richard

    2018-04-01

    Reformatting and filtering bioinformatics files are common tasks for bioinformaticians. Standard Linux tools and specific programs are usually used to perform such tasks but there is still a gap between using these tools and the programming interface of some existing libraries. In this study, we developed a set of tools namely bioalcidae, samjs and vcffilterjs that reformat or filter files using a JavaScript engine or a pure java expression and taking advantage of the java API for high-throughput sequencing data (htsjdk). https://github.com/lindenb/jvarkit. pierre.lindenbaum@univ-nantes.fr.

  12. Visiting Scholars Program Application | Frederick National Laboratory for Cancer Research

    Cancer.gov

    Below are scientific areas and programs that the Frederick National Labisactively seeking scholars to participate: Data Science and Information Technology (including Bioinformatics, Visualization, etc) Advanced Preclinical Researc

  13. XML schemas for common bioinformatic data types and their application in workflow systems.

    PubMed

    Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

    2006-11-06

    Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.

  14. Design and implementation of a library-based information service in molecular biology and genetics at the University of Pittsburgh

    PubMed Central

    Chattopadhyay, Ansuman; Tannery, Nancy Hrinya; Silverman, Deborah A. L.; Bergen, Phillip; Epstein, Barbara A.

    2006-01-01

    Setting: In summer 2002, the Health Sciences Library System (HSLS) at the University of Pittsburgh initiated an information service in molecular biology and genetics to assist researchers with identifying and utilizing bioinformatics tools. Program Components: This novel information service comprises hands-on training workshops and consultation on the use of bioinformatics tools. The HSLS also provides an electronic portal and networked access to public and commercial molecular biology databases and software packages. Evaluation Mechanisms: Researcher feedback gathered during the first three years of workshops and individual consultation indicate that the information service is meeting user needs. Next Steps/Future Directions: The service's workshop offerings will expand to include emerging bioinformatics topics. A frequently asked questions database is also being developed to reuse advice on complex bioinformatics questions. PMID:16888665

  15. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.

    Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBasemore » project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.« less

  16. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications

    PubMed Central

    Khomtchouk, Bohdan B; Weitz, Edmund; Karp, Peter D; Wahlestedt, Claes

    2018-01-01

    Abstract We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the ‘programmable programming language’. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology. PMID:28040748

  17. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications.

    PubMed

    Khomtchouk, Bohdan B; Weitz, Edmund; Karp, Peter D; Wahlestedt, Claes

    2018-05-01

    We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.

  18. Building a bioinformatics community of practice through library education programs.

    PubMed

    Moore, Margaret E; Vaughan, K T L; Hayes, Barrie E

    2004-01-01

    This paper addresses the following questions:What makes the community of practice concept an intriguing framework for developing library services for bioinformatics? What is the campus context and setting? What has been the Health Sciences Library's role in bioinformatics at the University of North Carolina (UNC) Chapel Hill? What are the Health Sciences Library's goals? What services are currently offered? How will these services be evaluated and developed? How can libraries demonstrate their value? Providing library services for an emerging community such as bioinformatics and computational biology presents special challenges for libraries including understanding needs, defining and communicating the library's role, building relationships within the community, preparing staff, and securing funding. Like many academic health sciences libraries, the University of North Carolina (UNC) at Chapel Hill Health Sciences Library is addressing these challenges in the context of its overall mission and goals.

  19. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics.

    PubMed

    Egea, Jose A; Henriques, David; Cokelaer, Thomas; Villaverde, Alejandro F; MacNamara, Aidan; Danciu, Diana-Patricia; Banga, Julio R; Saez-Rodriguez, Julio

    2014-05-10

    Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools. We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods. MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods.

  20. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics

    PubMed Central

    2014-01-01

    Background Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools. Results We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods. Conclusions MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods. PMID:24885957

  1. The University of Washington Health Sciences Library BioCommons: an evolving Northwest biomedical research information support infrastructure

    PubMed Central

    Minie, Mark; Bowers, Stuart; Tarczy-Hornoch, Peter; Roberts, Edward; James, Rose A.; Rambo, Neil; Fuller, Sherrilynne

    2006-01-01

    Setting: The University of Washington Health Sciences Libraries and Information Center BioCommons serves the bioinformatics needs of researchers at the university and in the vibrant for-profit and not-for-profit biomedical research sector in the Washington area and region. Program Components: The BioCommons comprises services addressing internal University of Washington, not-for-profit, for-profit, and regional and global clientele. The BioCommons is maintained and administered by the BioResearcher Liaison Team. The BioCommons architecture provides a highly flexible structure for adapting to rapidly changing resources and needs. Evaluation Mechanisms: BioCommons uses Web-based pre- and post-course evaluations and periodic user surveys to assess service effectiveness. Recent surveys indicate substantial usage of BioCommons services and a high level of effectiveness and user satisfaction. Next Steps/Future Directions: BioCommons is developing novel collaborative Web resources to distribute bioinformatics tools and is experimenting with Web-based competency training in bioinformation resource use. PMID:16888667

  2. BioStar: an online question & answer resource for the bioinformatics community

    USDA-ARS?s Scientific Manuscript database

    Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...

  3. Teaching Bioinformatics in Concert

    PubMed Central

    Goodman, Anya L.; Dekhtyar, Alex

    2014-01-01

    Can biology students without programming skills solve problems that require computational solutions? They can if they learn to cooperate effectively with computer science students. The goal of the in-concert teaching approach is to introduce biology students to computational thinking by engaging them in collaborative projects structured around the software development process. Our approach emphasizes development of interdisciplinary communication and collaboration skills for both life science and computer science students. PMID:25411792

  4. JASPAR RESTful API: accessing JASPAR data from any programming language.

    PubMed

    Khan, Aziz; Mathelier, Anthony

    2018-05-01

    JASPAR is a widely used open-access database of curated, non-redundant transcription factor binding profiles. Currently, data from JASPAR can be retrieved as flat files or by using programming language-specific interfaces. Here, we present a programming language-independent application programming interface (API) to access JASPAR data using the Representational State Transfer (REST) architecture. The REST API enables programmatic access to JASPAR by most programming languages and returns data in eight widely used formats. Several endpoints are available to access the data and an endpoint is available to infer the TF binding profile(s) likely bound by a given DNA binding domain protein sequence. Additionally, it provides an interactive browsable interface for bioinformatics tool developers. This REST API is implemented in Python using the Django REST Framework. It is accessible at http://jaspar.genereg.net/api/ and the source code is freely available at https://bitbucket.org/CBGR/jaspar under GPL v3 license. aziz.khan@ncmm.uio.no or anthony.mathelier@ncmm.uio.no. Supplementary data are available at Bioinformatics online.

  5. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  6. Combining medical informatics and bioinformatics toward tools for personalized medicine.

    PubMed

    Sarachan, B D; Simmons, M K; Subramanian, P; Temkin, J M

    2003-01-01

    Key bioinformatics and medical informatics research areas need to be identified to advance knowledge and understanding of disease risk factors and molecular disease pathology in the 21 st century toward new diagnoses, prognoses, and treatments. Three high-impact informatics areas are identified: predictive medicine (to identify significant correlations within clinical data using statistical and artificial intelligence methods), along with pathway informatics and cellular simulations (that combine biological knowledge with advanced informatics to elucidate molecular disease pathology). Initial predictive models have been developed for a pilot study in Huntington's disease. An initial bioinformatics platform has been developed for the reconstruction and analysis of pathways, and work has begun on pathway simulation. A bioinformatics research program has been established at GE Global Research Center as an important technology toward next generation medical diagnostics. We anticipate that 21 st century medical research will be a combination of informatics tools with traditional biology wet lab research, and that this will translate to increased use of informatics techniques in the clinic.

  7. Transcriptome Analysis of Human Immune Responses Following Live Vaccine Strain (LVS) Francisella Tularensis Vaccination

    DTIC Science & Technology

    2007-03-08

    with CD3D 50848 PAR1/UBE3A Prader–Willi syndrome chromosome region 1, GMCSFRalpha precursor, IL3Ralpha precursor (CD123) Brain development...intervention programs justifiable? Emerg. Infect. Dis. 3, 83–94. iebel, U., Kindler , B., Pepperkok, R., 2004. ‘Harvester’: a fast meta search engine of human...protein resources. Bioinformatics 20, 1962–1963. iebel, U., Kindler , B., Pepperkok, R., 2005. Bioinformatic “Harvester”: a search engine for genome

  8. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  9. Evaluating the Effectiveness of a Practical Inquiry-Based Learning Bioinformatics Module on Undergraduate Student Engagement and Applied Skills

    ERIC Educational Resources Information Center

    Brown, James A. L.

    2016-01-01

    A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion,…

  10. Meeting review: 2002 O'Reilly Bioinformatics Technology Conference.

    PubMed

    Counsell, Damian

    2002-01-01

    At the end of January I travelled to the States to speak at and attend the first O'Reilly Bioinformatics Technology Conference. It was a large, well-organized and diverse meeting with an interesting history. Although the meeting was not a typical academic conference, its style will, I am sure, become more typical of meetings in both biological and computational sciences.Speakers at the event included prominent bioinformatics researchers such as Ewan Birney, Terry Gaasterland and Lincoln Stein; authors and leaders in the open source programming community like Damian Conway and Nat Torkington; and representatives from several publishing companies including the Nature Publishing Group, Current Science Group and the President of O'Reilly himself, Tim O'Reilly. There were presentations, tutorials, debates, quizzes and even a 'jam session' for musical bioinformaticists.

  11. WIWS: a protein structure bioinformatics Web service collection.

    PubMed

    Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G

    2010-07-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.

  12. Meeting Review: 2002 O'Reilly Bioinformatics Technology Conference

    PubMed Central

    2002-01-01

    At the end of January I travelled to the States to speak at and attend the first O’Reilly Bioinformatics Technology Conference [14]. It was a large, well-organized and diverse meeting with an interesting history. Although the meeting was not a typical academic conference, its style will, I am sure, become more typical of meetings in both biological and computational sciences. Speakers at the event included prominent bioinformatics researchers such as Ewan Birney, Terry Gaasterland and Lincoln Stein; authors and leaders in the open source programming community like Damian Conway and Nat Torkington; and representatives from several publishing companies including the Nature Publishing Group, Current Science Group and the President of O’Reilly himself, Tim O’Reilly. There were presentations, tutorials, debates, quizzes and even a ‘jam session’ for musical bioinformaticists. PMID:18628852

  13. BioRuby: bioinformatics software for the Ruby programming language.

    PubMed

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-10-15

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org

  14. Bioinformatic approaches to augment study of epithelial-to-mesenchymal transition in lung cancer

    PubMed Central

    Beck, Tim N.; Chikwem, Adaeze J.; Solanki, Nehal R.

    2014-01-01

    Bioinformatic approaches are intended to provide systems level insight into the complex biological processes that underlie serious diseases such as cancer. In this review we describe current bioinformatic resources, and illustrate how they have been used to study a clinically important example: epithelial-to-mesenchymal transition (EMT) in lung cancer. Lung cancer is the leading cause of cancer-related deaths and is often diagnosed at advanced stages, leading to limited therapeutic success. While EMT is essential during development and wound healing, pathological reactivation of this program by cancer cells contributes to metastasis and drug resistance, both major causes of death from lung cancer. Challenges of studying EMT include its transient nature, its molecular and phenotypic heterogeneity, and the complicated networks of rewired signaling cascades. Given the biology of lung cancer and the role of EMT, it is critical to better align the two in order to advance the impact of precision oncology. This task relies heavily on the application of bioinformatic resources. Besides summarizing recent work in this area, we use four EMT-associated genes, TGF-β (TGFB1), NEDD9/HEF1, β-catenin (CTNNB1) and E-cadherin (CDH1), as exemplars to demonstrate the current capacities and limitations of probing bioinformatic resources to inform hypothesis-driven studies with therapeutic goals. PMID:25096367

  15. Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds

    PubMed Central

    2012-01-01

    Background The MapReduce framework enables a scalable processing and analyzing of large datasets by distributing the computational load on connected computer nodes, referred to as a cluster. In Bioinformatics, MapReduce has already been adopted to various case scenarios such as mapping next generation sequencing data to a reference genome, finding SNPs from short read data or matching strings in genotype files. Nevertheless, tasks like installing and maintaining MapReduce on a cluster system, importing data into its distributed file system or executing MapReduce programs require advanced knowledge in computer science and could thus prevent scientists from usage of currently available and useful software solutions. Results Here we present Cloudgene, a freely available platform to improve the usability of MapReduce programs in Bioinformatics by providing a graphical user interface for the execution, the import and export of data and the reproducibility of workflows on in-house (private clouds) and rented clusters (public clouds). The aim of Cloudgene is to build a standardized graphical execution environment for currently available and future MapReduce programs, which can all be integrated by using its plug-in interface. Since Cloudgene can be executed on private clusters, sensitive datasets can be kept in house at all time and data transfer times are therefore minimized. Conclusions Our results show that MapReduce programs can be integrated into Cloudgene with little effort and without adding any computational overhead to existing programs. This platform gives developers the opportunity to focus on the actual implementation task and provides scientists a platform with the aim to hide the complexity of MapReduce. In addition to MapReduce programs, Cloudgene can also be used to launch predefined systems (e.g. Cloud BioLinux, RStudio) in public clouds. Currently, five different bioinformatic programs using MapReduce and two systems are integrated and have been successfully deployed. Cloudgene is freely available at http://cloudgene.uibk.ac.at. PMID:22888776

  16. cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud.

    PubMed

    Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren

    2016-01-15

    : One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. hodor_paul@bah.com. © The Author 2015. Published by Oxford University Press.

  17. An integrated bioinformatics infrastructure essential for advancing pharmacogenomics and personalized medicine in the context of the FDA's Critical Path Initiative.

    PubMed

    Tong, Weida; Harris, Stephen C; Fang, Hong; Shi, Leming; Perkins, Roger; Goodsaid, Federico; Frueh, Felix W

    2007-01-01

    Pharmacogenomics (PGx) is identified in the FDA Critical Path document as a major opportunity for advancing medical product development and personalized medicine. An integrated bioinformatics infrastructure for use in FDA data review is crucial to realize the benefits of PGx for public health. We have developed an integrated bioinformatics tool, called ArrayTrack, for managing, analyzing and interpreting genomic and other biomarker data (e.g. proteomic and metabolomic data). ArrayTrack is a highly flexible and robust software platform, which allows evolving with technological advances and changing user needs. ArrayTrack is used in the routine review of genomic data submitted to the FDA; here, three hypothetical examples of its use in the Voluntary eXploratory Data Submission (VXDS) program are illustrated.: © Published by Elsevier Ltd.

  18. In silico methods for evaluating human allergenicity to novel proteins: International Bioinformatics Workshop Meeting Report, 23-24 February 2005.

    PubMed

    Thomas, Karluss; Bannon, Gary; Hefle, Susan; Herouet, Corinne; Holsapple, Michael; Ladics, Gregory; MacIntosh, Sue; Privalle, Laura

    2005-12-01

    The ILSI Health and Environmental Sciences Institute (HESI) hosted an expert workshop 22-24 February 2005 in Mallorca, Spain, to review the state-of-the-science for conducting a sequence homology/bioinformatics evaluation in the context of a comprehensive allergenicity assessment for novel proteins, to obtain consensus on the value and role of bioinformatics in evaluating novel proteins, and to discuss the utility and methods of allergen-specific IgE testing in the diagnosis of food allergy. The workshop participants included over forty international experts from academia, industry, and government. The workshop was hosted by the HESI Protein Allergenicity Technical committee, which has established a long-term program whose mission is to advance the scientific understanding of the relevant parameters for characterizing the allergenic potential of novel proteins.

  19. cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud

    PubMed Central

    Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren

    2016-01-01

    Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com PMID:26428290

  20. Using EMBL-EBI services via Web interface and programmatically via Web Services

    PubMed Central

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2015-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. PMID:25501941

  1. KMC 3: counting and manipulating k-mer statistics.

    PubMed

    Kokot, Marek; Dlugosz, Maciej; Deorowicz, Sebastian

    2017-09-01

    Counting all k -mers in a given dataset is a standard procedure in many bioinformatics applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm together with KMC tools for manipulating k -mer databases. Usefulness of the tools is shown on a few real problems. Program is freely available at http://sun.aei.polsl.pl/REFRESH/kmc . sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Evolving from bioinformatics in-the-small to bioinformatics in-the-large.

    PubMed

    Parker, D Stott; Gorlick, Michael M; Lee, Christopher J

    2003-01-01

    We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.

  3. HIV Therapy Simulator: a graphical user interface for comparing the effectiveness of novel therapy regimens.

    PubMed

    Lim, Huat Chye; Curlin, Marcel E; Mittler, John E

    2011-11-01

    Computer simulation models can be useful in exploring the efficacy of HIV therapy regimens in preventing the evolution of drug-resistant viruses. Current modeling programs, however, were designed by researchers with expertise in computational biology, limiting their accessibility to those who might lack such a background. We have developed a user-friendly graphical program, HIV Therapy Simulator (HIVSIM), that is accessible to non-technical users. The program allows clinicians and researchers to explore the effectiveness of various therapeutic strategies, such as structured treatment interruptions, booster therapies and induction-maintenance therapies. We anticipate that HIVSIM will be useful for evaluating novel drug-based treatment concepts in clinical research, and as an educational tool. HIV Therapy Simulator is freely available for Mac OS and Windows at http://sites.google.com/site/hivsimulator/. jmittler@uw.edu. Supplementary data are available at Bioinformatics online.

  4. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  5. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  6. Composable languages for bioinformatics: the NYoSh experiment

    PubMed Central

    Simi, Manuele

    2014-01-01

    Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org. PMID:24482760

  7. Composable languages for bioinformatics: the NYoSh experiment.

    PubMed

    Simi, Manuele; Campagne, Fabien

    2014-01-01

    Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.

  8. A novel program to design siRNAs simultaneously effective to highly variable virus genomes.

    PubMed

    Lee, Hui Sun; Ahn, Jeonghyun; Jun, Eun Jung; Yang, Sanghwa; Joo, Chul Hyun; Kim, Yoo Kyum; Lee, Heuiran

    2009-07-10

    A major concern of antiviral therapy using small interfering RNAs (siRNAs) targeting RNA viral genome is high sequence diversity and mutation rate due to genetic instability. To overcome this problem, it is indispensable to design siRNAs targeting highly conserved regions. We thus designed CAPSID (Convenient Application Program for siRNA Design), a novel bioinformatics program to identify siRNAs targeting highly conserved regions within RNA viral genomes. From a set of input RNAs of diverse sequences, CAPSID rapidly searches conserved patterns and suggests highly potent siRNA candidates in a hierarchical manner. To validate the usefulness of this novel program, we investigated the antiviral potency of universal siRNA for various Human enterovirus B (HEB) serotypes. Assessment of antiviral efficacy using Hela cells, clearly demonstrates that HEB-specific siRNAs exhibit protective effects against all HEBs examined. These findings strongly indicate that CAPSID can be applied to select universal antiviral siRNAs against highly divergent viral genomes.

  9. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  10. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  11. Development of a cloud-based Bioinformatics Training Platform.

    PubMed

    Revote, Jerico; Watson-Haigh, Nathan S; Quenette, Steve; Bethwaite, Blair; McGrath, Annette; Shang, Catherine A

    2017-05-01

    The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. © The Author 2016. Published by Oxford University Press.

  12. Development of a cloud-based Bioinformatics Training Platform

    PubMed Central

    Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette

    2017-01-01

    Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333

  13. Computer Science Research Funding: How Much Is Too Little?

    DTIC Science & Technology

    2009-06-01

    Bioinformatics Parallel computing Computational biology Principles of programming Computational neuroscience Real-time and embedded systems Scientific...National Security Agency ( NSA ) • Missile Defense Agency (MDA) and others The various research programs have been coordinated through the DDR&E...DOD funding included only DARPA and OSD programs. FY07 and FY08 PBR funding included DARPA, NSA , some of the Services’ basic and applied research

  14. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.

    PubMed

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2014-12-12

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. Copyright © 2014 John Wiley & Sons, Inc.

  15. ISMB 2016 offers outstanding science, networking, and celebration

    PubMed Central

    Fogg, Christiana

    2016-01-01

    The annual international conference on Intelligent Systems for Molecular Biology (ISMB) is the major meeting of the International Society for Computational Biology (ISCB). Over the past 23 years the ISMB conference has grown to become the world's largest bioinformatics/computational biology conference. ISMB 2016 will be the year's most important computational biology event globally. The conferences provide a multidisciplinary forum for disseminating the latest developments in bioinformatics/computational biology. ISMB brings together scientists from computer science, molecular biology, mathematics, statistics and related fields. Its principal focus is on the development and application of advanced computational methods for biological problems. ISMB 2016 offers the strongest scientific program and the broadest scope of any international bioinformatics/computational biology conference. Building on past successes, the conference is designed to cater to variety of disciplines within the bioinformatics/computational biology community.  ISMB 2016 takes place July 8 - 12 at the Swan and Dolphin Hotel in Orlando, Florida, United States. For two days preceding the conference, additional opportunities including Satellite Meetings, Student Council Symposium, and a selection of Special Interest Group Meetings and Applied Knowledge Exchange Sessions (AKES) are all offered to enable registered participants to learn more on the latest methods and tools within specialty research areas. PMID:27347392

  16. ISMB 2016 offers outstanding science, networking, and celebration.

    PubMed

    Fogg, Christiana

    2016-01-01

    The annual international conference on Intelligent Systems for Molecular Biology (ISMB) is the major meeting of the International Society for Computational Biology (ISCB). Over the past 23 years the ISMB conference has grown to become the world's largest bioinformatics/computational biology conference. ISMB 2016 will be the year's most important computational biology event globally. The conferences provide a multidisciplinary forum for disseminating the latest developments in bioinformatics/computational biology. ISMB brings together scientists from computer science, molecular biology, mathematics, statistics and related fields. Its principal focus is on the development and application of advanced computational methods for biological problems. ISMB 2016 offers the strongest scientific program and the broadest scope of any international bioinformatics/computational biology conference. Building on past successes, the conference is designed to cater to variety of disciplines within the bioinformatics/computational biology community.  ISMB 2016 takes place July 8 - 12 at the Swan and Dolphin Hotel in Orlando, Florida, United States. For two days preceding the conference, additional opportunities including Satellite Meetings, Student Council Symposium, and a selection of Special Interest Group Meetings and Applied Knowledge Exchange Sessions (AKES) are all offered to enable registered participants to learn more on the latest methods and tools within specialty research areas.

  17. Development of Bioinformatics Infrastructure for Genomics Research.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Adebiyi, Marion; Adeyemi, Seun; Ahmed, Azza; Ahmed, Rehab; Akanle, Bola; Alibi, Mohamed; Armstrong, Don L; Aron, Shaun; Ashano, Efejiro; Baichoo, Shakuntala; Benkahla, Alia; Brown, David K; Chimusa, Emile R; Fadlelmola, Faisal M; Falola, Dare; Fatumo, Segun; Ghedira, Kais; Ghouila, Amel; Hazelhurst, Scott; Isewon, Itunuoluwa; Jung, Segun; Kassim, Samar Kamal; Kayondo, Jonathan K; Mbiyavanga, Mamana; Meintjes, Ayton; Mohammed, Somia; Mosaku, Abayomi; Moussa, Ahmed; Muhammd, Mustafa; Mungloo-Dilmohamud, Zahra; Nashiru, Oyekanmi; Odia, Trust; Okafor, Adaobi; Oladipo, Olaleye; Osamor, Victor; Oyelade, Jellili; Sadki, Khalid; Salifu, Samson Pandam; Soyemi, Jumoke; Panji, Sumir; Radouani, Fouzia; Souiai, Oussama; Tastan Bishop, Özlem

    2017-06-01

    Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa. Copyright © 2017 World Heart Federation (Geneva). Published by Elsevier B.V. All rights reserved.

  18. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538

  19. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.

  20. Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides.

    PubMed

    Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C

    2018-06-01

    There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.

  1. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  2. Biowep: a workflow enactment portal for bioinformatics applications.

    PubMed

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-03-08

    The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.

  3. Biowep: a workflow enactment portal for bioinformatics applications

    PubMed Central

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-01-01

    Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563

  4. Johns Hopkins University Announces Frederick CREST Classes for Fall 2016 | Poster

    Cancer.gov

    Johns Hopkins University’s (JHU) Advanced Academic Programs (AAP) division recently announced two classes that will be hosted at the Frederick Center for Research and Education in Science and Technology (CREST) this fall. According to a JHU press release, the classes are Biochemistry, which is part of the M.S. in Biotechnology program at JHU AAP, and Molecular Biology, a part of the M.S. in Bioinformatics program at JHU AAP.

  5. GProX, a user-friendly platform for bioinformatics analysis and visualization of quantitative proteomics data.

    PubMed

    Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy

    2011-08-01

    Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.

  6. GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data*

    PubMed Central

    Rigbolt, Kristoffer T. G.; Vanselow, Jens T.; Blagoev, Blagoy

    2011-01-01

    Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)1. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510

  7. ClusterControl: a web interface for distributing and monitoring bioinformatics applications on a Linux cluster.

    PubMed

    Stocker, Gernot; Rieder, Dietmar; Trajanoski, Zlatko

    2004-03-22

    ClusterControl is a web interface to simplify distributing and monitoring bioinformatics applications on Linux cluster systems. We have developed a modular concept that enables integration of command line oriented program into the application framework of ClusterControl. The systems facilitate integration of different applications accessed through one interface and executed on a distributed cluster system. The package is based on freely available technologies like Apache as web server, PHP as server-side scripting language and OpenPBS as queuing system and is available free of charge for academic and non-profit institutions. http://genome.tugraz.at/Software/ClusterControl

  8. Open source tools and toolkits for bioinformatics: significance, and where are we?

    PubMed

    Stajich, Jason E; Lapp, Hilmar

    2006-09-01

    This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.

  9. An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities

    PubMed Central

    Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R.; Ellis, Heidi JC; Hinman, M. Lee; Vyas, Jay; Gryk, Michael R.

    2012-01-01

    Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org). PMID:25328913

  10. Exploiting graphics processing units for computational biology and bioinformatics.

    PubMed

    Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

    2010-09-01

    Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.

  11. An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities.

    PubMed

    Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R; Ellis, Heidi Jc; Hinman, M Lee; Vyas, Jay; Gryk, Michael R

    2012-01-01

    Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).

  12. The Ensembl REST API: Ensembl Data for Any Language

    PubMed Central

    Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R. S.; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

    2015-01-01

    Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability and implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. Contact: ayates@ebi.ac.uk or flicek@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25236461

  13. Building international genomics collaboration for global health security

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installationmore » of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries.« less

  14. Building international genomics collaboration for global health security

    DOE PAGES

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.; ...

    2015-12-07

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installationmore » of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries.« less

  15. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  16. TeachEnG: a Teaching Engine for Genomics.

    PubMed

    Kim, Minji; Kim, Yeonsung; Qian, Lei; Song, Jun S

    2017-10-15

    Bioinformatics is a rapidly growing field that has emerged from the synergy of computer science, statistics and biology. Given the interdisciplinary nature of bioinformatics, many students from diverse fields struggle with grasping bioinformatic concepts only from classroom lectures. Interactive tools for helping students reinforce their learning would be thus desirable. Here, we present an interactive online educational tool called TeachEnG (acronym for Teaching Engine for Genomics) for reinforcing key concepts in sequence alignment and phylogenetic tree reconstruction. Our instructional games allow students to align sequences by hand, fill out the dynamic programming matrix in the Needleman-Wunsch global sequence alignment algorithm, and reconstruct phylogenetic trees via the maximum parsimony, Unweighted Pair Group Method with Arithmetic mean (UPGMA) and Neighbor-Joining algorithms. With an easily accessible interface and instant visual feedback, TeachEnG will help promote active learning in bioinformatics. TeachEnG is freely available at http://teacheng.illinois.edu. The source code is available from https://github.com/KnowEnG/TeachEnG under the Artistic License 2.0. It is written in JavaScript and compatible with Firefox, Safari, Chrome and Microsoft Edge. songj@illinois.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  17. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  18. BioMake: a GNU make-compatible utility for declarative workflow management.

    PubMed

    Holmes, Ian H; Mungall, Christopher J

    2017-11-01

    The Unix 'make' program is widely used in bioinformatics pipelines, but suffers from problems that limit its application to large analysis datasets. These include reliance on file modification times to determine whether a target is stale, lack of support for parallel execution on clusters, and restricted flexibility to extend the underlying logic program. We present BioMake, a make-like utility that is compatible with most features of GNU Make and adds support for popular cluster-based job-queue engines, MD5 signatures as an alternative to timestamps, and logic programming extensions in Prolog. BioMake is available for MacOSX and Linux systems from https://github.com/evoldoers/biomake under the BSD3 license. The only dependency is SWI-Prolog (version 7), available from http://www.swi-prolog.org/. ihholmes + biomake@gmail.com or cmungall + biomake@gmail.com. Feature table comparing BioMake to similar tools. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. KISS for STRAP: user extensions for a protein alignment editor.

    PubMed

    Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius

    2003-12-12

    The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/

  20. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.

  1. A curricula-based comparison of biomedical and health informatics programs in the USA

    PubMed Central

    Hemminger, Bradley M

    2011-01-01

    Objective The field of Biomedical and Health Informatics (BMHI) continues to define itself, and there are many educational programs offering ‘informatics’ degrees with varied foci. The goal of this study was to develop a scheme for systematic comparison of programs across the entire BMHI spectrum and to identify commonalities among informatics curricula. Design Guided by several published competency sets, a grounded theory approach was used to develop a program/curricula categorization scheme based on the descriptions of 636 courses offered by 73 public health, nursing, health, medical, and bioinformatics programs in the USA. The scheme was then used to compare the programs in the aforementioned five informatics disciplines. Results The authors developed a Course-Based Informatics Program Categorization (CBIPC) scheme that can be used both to classify coursework for any BMHI educational program and to compare programs from the same or related disciplines. The application of CBIPC scheme to the analysis of public health, nursing, health, medical, and bioinformatics programs reveals distinct intradisciplinary curricular patterns and a common core of courses across the entire BMHI education domain. Limitations The study is based on descriptions of courses from the university's webpages. Thus, it is limited to sampling courses at one moment in time, and classification for the coding scheme is based primarily on course titles and course descriptions. Conclusion The CBIPC scheme combines empirical data about educational curricula from diverse informatics programs and several published competency sets. It also provides a foundation for discussion of BMHI education as a whole and can help define subdisciplinary competencies. PMID:21292707

  2. Program Bytes - Satellite Meetings, SIGs, and AKEs at ISMB 2016

    PubMed Central

    Fogg, Christiana

    2016-01-01

    The International Society for Computational Biology (ISCB) is gearing up for its flagship meeting, ISMB.  The conference is the world’s largest gathering of computational biology and bioinformatics researchers. The ISMB 2016 pre-conference program will feature 17 special interest group and satellite meetings over the course of two days. This article previews those special meetings at ISCB’s flagship meeting, ISMB. PMID:27347388

  3. Identification of amino acid residues involved in substrate specificity of plant acyl-ACP thioesterases using a bioinformatics-guided approach

    PubMed Central

    Mayer, Kimberly M; Shanklin, John

    2007-01-01

    Background The large amount of available sequence information for the plant acyl-ACP thioesterases (TEs) made it possible to use a bioinformatics-guided approach to identify amino acid residues involved in substrate specificity. The Conserved Property Difference Locator (CPDL) program allowed the identification of putative specificity-determining residues that differ between the FatA and FatB TE classes. Six of the FatA residue differences identified by CPDL were incorporated into the FatB-like parent via site-directed mutagenesis and the effect of each on TE activity was determined. Variants were expressed in E. coli strain K27 that allows determination of enzyme activity by GCMS analysis of fatty acids released into the medium. Results Substitutions at four of the positions (74, 86, 141, and 174) changed substrate specificity to varying degrees while changes at the remaining two positions, 110 and 221, essentially inactivated the thioesterase. The effects of substitutions at positions 74, 141, and 174 (3-MUT) or 74, 86, 141, 174 (4-MUT) were not additive with respect to specificity. Conclusion Four of six putative specificity determining positions in plant TEs, identified with the use of CPDL, were validated experimentally; a novel colorimetric screen that discriminates between active and inactive TEs is also presented. PMID:17201914

  4. G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS.

    PubMed

    Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin

    2015-01-01

    Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%.

  5. G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS

    PubMed Central

    Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin

    2015-01-01

    Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%. PMID:26504488

  6. The Topology Prediction of Membrane Proteins: A Web-Based Tutorial.

    PubMed

    Kandemir-Cavas, Cagin; Cavas, Levent; Alyuruk, Hakan

    2018-06-01

    There is a great need for development of educational materials on the transfer of current bioinformatics knowledge to undergraduate students in bioscience departments. In this study, it is aimed to prepare an example in silico laboratory tutorial on the topology prediction of membrane proteins by bioinformatics tools. This laboratory tutorial is prepared for biochemistry lessons at bioscience departments (biology, chemistry, biochemistry, molecular biology and genetics, and faculty of medicine). The tutorial is intended for students who have not taken a bioinformatics course yet or already have taken a course as an introduction to bioinformatics. The tutorial is based on step-by-step explanations with illustrations. It can be applied under supervision of an instructor in the lessons, or it can be used as a self-study guide by students. In the tutorial, membrane-spanning regions and α-helices of membrane proteins were predicted by internet-based bioinformatics tools. According to the results achieved from internet-based bioinformatics tools, the algorithms and parameters used were effective on the accuracy of prediction. The importance of this laboratory tutorial lies on the facts that it provides an introduction to the bioinformatics and that it also demonstrates an in silico laboratory application to the students at natural sciences. The presented example education material is applicable easily at all departments that have internet connection. This study presents an alternative education material to the students in biochemistry laboratories in addition to classical laboratory experiments.

  7. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  8. Bioinformatics Education in High School: Implications for Promoting Science, Technology, Engineering, and Mathematics Careers

    ERIC Educational Resources Information Center

    Kovarik, Dina N.; Patterson, Davis G.; Cohen, Carolyn; Sanders, Elizabeth A.; Peterson, Karen A.; Porter, Sandra G.; Chowning, Jeanne Ting

    2013-01-01

    We investigated the effects of our Bio-ITEST teacher professional development model and bioinformatics curricula on cognitive traits (awareness, engagement, self-efficacy, and relevance) in high school teachers and students that are known to accompany a developing interest in science, technology, engineering, and mathematics (STEM) careers. The…

  9. Strategies for Using Peer-Assisted Learning Effectively in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Shapiro, Casey; Ayon, Carlos; Moberg-Parker, Jordan; Levis-Fitzgerald, Marc; Sanders, Erin R.

    2013-01-01

    This study used a mixed methods approach to evaluate hybrid peer-assisted learning approaches incorporated into a bioinformatics tutorial for a genome annotation research project. Quantitative and qualitative data were collected from undergraduates who enrolled in a research-based laboratory course during two different academic terms at UCLA.…

  10. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  11. Aether: leveraging linear programming for optimal cloud computing in genomics.

    PubMed

    Luber, Jacob M; Tierney, Braden T; Cofer, Evan M; Patel, Chirag J; Kostic, Aleksandar D

    2018-05-01

    Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users' existing HPC pipelines. Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. chirag_patel@hms.harvard.edu or aleksandar.kostic@joslin.harvard.edu. Supplementary data are available at Bioinformatics online.

  12. G2S: a web-service for annotating genomic variants on 3D protein structures.

    PubMed

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-06-01

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  13. A novel cross-disciplinary multi-institute approach to translational cancer research: lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC).

    PubMed

    Patel, Ashokkumar A; Gilbertson, John R; Showe, Louise C; London, Jack W; Ross, Eric; Ochs, Michael F; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V; Dhir, Rajiv; Beck, J Robert; Liebman, Michael; Garcia, Fernando U; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B; Becich, Michael J

    2007-06-08

    The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium's intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions' IRB/HIPAA standards. Currently, this "virtual biorepository" has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium's web site. The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.

  14. A Novel Cross-Disciplinary Multi-Institute Approach to Translational Cancer Research: Lessons Learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC)

    PubMed Central

    Patel, Ashokkumar A.; Gilbertson, John R.; Showe, Louise C.; London, Jack W.; Ross, Eric; Ochs, Michael F.; Carver, Joseph; Lazarus, Andrea; Parwani, Anil V.; Dhir, Rajiv; Beck, J. Robert; Liebman, Michael; Garcia, Fernando U.; Prichard, Jeff; Wilkerson, Myra; Herberman, Ronald B.; Becich, Michael J.

    2007-01-01

    Background: The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers. Methods: The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards. Results: Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site. Conclusions: The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy. PMID:19455246

  15. Using the QCM Biosensor-Based T7 Phage Display Combined with Bioinformatics Analysis for Target Identification of Bioactive Small Molecule.

    PubMed

    Takakusagi, Yoichi; Takakusagi, Kaori; Sugawara, Fumio; Sakaguchi, Kengo

    2018-01-01

    Identification of target proteins that directly bind to bioactive small molecule is of great interest in terms of clarifying the mode of action of the small molecule as well as elucidating the biological phenomena at the molecular level. Of the experimental technologies available, T7 phage display allows comprehensive screening of small molecule-recognizing amino acid sequence from the peptide libraries displayed on the T7 phage capsid. Here, we describe the T7 phage display strategy that is combined with quartz-crystal microbalance (QCM) biosensor for affinity selection platform and bioinformatics analysis for small molecule-recognizing short peptides. This method dramatically enhances efficacy and throughput of the screening for small molecule-recognizing amino acid sequences without repeated rounds of selection. Subsequent execution of bioinformatics programs allows combinatorial and comprehensive target protein discovery of small molecules with its binding site, regardless of protein sample insolubility, instability, or inaccessibility of the fixed small molecules to internally located binding site on larger target proteins when conventional proteomics approaches are used.

  16. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  17. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  18. Application of virtual phase-shifting speckle-interferometry for detection of polymorphism in the Chlamydia trachomatis omp1 gene

    NASA Astrophysics Data System (ADS)

    Feodorova, Valentina A.; Saltykov, Yury V.; Zaytsev, Sergey S.; Ulyanov, Sergey S.; Ulianova, Onega V.

    2018-04-01

    Method of phase-shifting speckle-interferometry has been used as a new tool with high potency for modern bioinformatics. Virtual phase-shifting speckle-interferometry has been applied for detection of polymorphism in the of Chlamydia trachomatis omp1 gene. It has been shown, that suggested method is very sensitive to natural genetic mutations as single nucleotide polymorphism (SNP). Effectiveness of proposed method has been compared with effectiveness of the newest bioinformatic tools, based on nucleotide sequence alignment.

  19. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  20. Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

    PubMed

    Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

    2007-01-01

    Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

  1. Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

    USDA-ARS?s Scientific Manuscript database

    Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...

  2. Assessing an effective undergraduate module teaching applied bioinformatics to biology students

    PubMed Central

    2018-01-01

    Applied bioinformatics skills are becoming ever more indispensable for biologists, yet incorporation of these skills into the undergraduate biology curriculum is lagging behind, in part due to a lack of instructors willing and able to teach basic bioinformatics in classes that don’t specifically focus on quantitative skill development, such as statistics or computer sciences. To help undergraduate course instructors who themselves did not learn bioinformatics as part of their own education and are hesitant to plunge into teaching big data analysis, a module was developed that is written in plain-enough language, using publicly available computing tools and data, to allow novice instructors to teach next-generation sequence analysis to upper-level undergraduate students. To determine if the module allowed students to develop a better understanding of and appreciation for applied bioinformatics, various tools were developed and employed to assess the impact of the module. This article describes both the module and its assessment. Students found the activity valuable for their education and, in focus group discussions, emphasized that they saw a need for more and earlier instruction of big data analysis as part of the undergraduate biology curriculum. PMID:29324777

  3. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  4. Aether: leveraging linear programming for optimal cloud computing in genomics

    PubMed Central

    Luber, Jacob M; Tierney, Braden T; Cofer, Evan M; Patel, Chirag J

    2018-01-01

    Abstract Motivation Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Results Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementation Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Contact chirag_patel@hms.harvard.edu or aleksandar.kostic@joslin.harvard.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:29228186

  5. Evaluating the effectiveness of a practical inquiry-based learning bioinformatics module on undergraduate student engagement and applied skills.

    PubMed

    Brown, James A L

    2016-05-06

    A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  6. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

    PubMed

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos

    2017-08-01

    Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.

  7. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

    PubMed Central

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis

    2017-01-01

    Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616

  8. BioXSD: the common data-exchange format for everyday bioinformatics web services.

    PubMed

    Kalas, Matús; Puntervoll, Pål; Joseph, Alexandre; Bartaseviciūte, Edita; Töpfer, Armin; Venkataraman, Prabakar; Pettifer, Steve; Bryne, Jan Christian; Ison, Jon; Blanchet, Christophe; Rapacki, Kristoffer; Jonassen, Inge

    2010-09-15

    The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community.

  9. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  10. Challenges of Identifying Clinically Actionable Genetic Variants for Precision Medicine

    PubMed Central

    2016-01-01

    Advances in genomic medicine have the potential to change the way we treat human disease, but translating these advances into reality for improving healthcare outcomes depends essentially on our ability to discover disease- and/or drug-associated clinically actionable genetic mutations. Integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a big data infrastructure can provide an efficient and effective way to identify clinically actionable genetic variants for personalized treatments and reduce healthcare costs. We review bioinformatics processing of next-generation sequencing (NGS) data, bioinformatics infrastructures for implementing precision medicine, and bioinformatics approaches for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:27195526

  11. Recognition of Protein-coding Genes Based on Z-curve Algorithms

    PubMed Central

    -Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling

    2014-01-01

    Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027

  12. Bioinformatics for transporter pharmacogenomics and systems biology: data integration and modeling with UML.

    PubMed

    Yan, Qing

    2010-01-01

    Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.

  13. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  14. Simulation of Two Dimensional Electrophoresis and Tandem Mass Spectrometry for Teaching Proteomics

    ERIC Educational Resources Information Center

    Fisher, Amanda; Sekera, Emily; Payne, Jill; Craig, Paul

    2012-01-01

    In proteomics, complex mixtures of proteins are separated (usually by chromatography or electrophoresis) and identified by mass spectrometry. We have created 2DE Tandem MS, a computer program designed for use in the biochemistry, proteomics, or bioinformatics classroom. It contains two simulations--2D electrophoresis and tandem mass spectrometry.…

  15. A Course-Based Research Experience: How Benefits Change with Increased Investment in Instructional Time

    ERIC Educational Resources Information Center

    Shaffer, Christopher D.; Alvarez, Consuelo J.; Bednarski, April E.; Dunbar, David; Goodman, Anya L.; Reinke, Catherine; Rosenwald, Anne G.; Wolyniak, Michael J.; Bailey, Cheryl; Barnard, Daron; Bazinet, Christopher; Beach, Dale L.; Bedard, James E. J.; Bhalla, Satish; Braverman, John; Burg, Martin; Chandrasekaran, Vidya; Chung, Hui-Min; Clase, Kari; DeJong, Randall J.; DiAngelo, Justin R.; Du, Chunguang; Eckdahl, Todd T.; Eisler, Heather; Emerson, Julia A.; Frary, Amy; Frohlich, Donald; Gosser, Yuying; Govind, Shubha; Haberman, Adam; Hark, Amy T.; Hauser, Charles; Hoogewerf, Arlene; Hoopes, Laura L. M.; Howell, Carina E.; Johnson, Diana; Jones, Christopher J.; Kadlec, Lisa; Kaehler, Marian; Key, S. Catherine Silver; Kleinschmit, Adam; Kokan, Nighat P.; Kopp, Olga; Kuleck, Gary; Leatherman, Judith; Lopilato, Jane; MacKinnon, Christy; Martinez-Cruzado, Juan Carlos; McNeil, Gerard; Mel, Stephanie; Mistry, Hemlata; Nagengast, Alexis; Overvoorde, Paul; Paetkau, Don W.; Parrish, Susan; Peterson, Celeste N.; Preuss, Mary; Reed, Laura K.; Revie, Dennis; Robic, Srebrenka; Roecklein-Canfield, Jennifer; Rubin, Michael R.; Saville, Kenneth; Schroeder, Stephanie; Sharif, Karim; Shaw, Mary; Skuse, Gary; Smith, Christopher D.; Smith, Mary A.; Smith, Sheryl T.; Spana, Eric; Spratt, Mary; Sreenivasan, Aparna; Stamm, Joyce; Szauter, Paul; Thompson, Jeffrey S.; Wawersik, Matthew; Youngblom, James; Zhou, Leming; Mardis, Elaine R.; Buhler, Jeremy; Leung, Wilson; Lopatto, David; Elgin, Sarah C. R.

    2014-01-01

    There is widespread agreement that science, technology, engineering, and mathematics programs should provide undergraduates with research experience. Practical issues and limited resources, however, make this a challenge. We have developed a bioinformatics project that provides a course-based research experience for students at a diverse group of…

  16. Coordinated effort to advance genomes-to-phenomes through the integration of bioinformatics with aquaculture research

    USDA-ARS?s Scientific Manuscript database

    Aquaculture is the fastest growing food production system in the world. The research program at the USDA-ARS-SNARC strives to improve the efficiency and sustainability of warmwater U.S. aquaculture. SNARC scientists have impacted the catfish (#1 U.S. aquaculture industry), tilapia (#3) and hybrid st...

  17. Bioengineering and Bioinformatics Summer Institutes: Meeting Modern Challenges in Undergraduate Summer Research

    ERIC Educational Resources Information Center

    Butler, Peter J.; Dong, Cheng; Snyder, Alan J.; Jones, A. Daniel; Sheets, Erin D.

    2008-01-01

    Summer undergraduate research programs in science and engineering facilitate research progress for faculty and provide a close-ended research experience for students, which can prepare them for careers in industry, medicine, and academia. However, ensuring these outcomes is a challenge when the students arrive ill-prepared for substantive research…

  18. A Teaching Approach from the Exhaustive Search Method to the Needleman-Wunsch Algorithm

    ERIC Educational Resources Information Center

    Xu, Zhongneng; Yang, Yayun; Huang, Beibei

    2017-01-01

    The Needleman-Wunsch algorithm has become one of the core algorithms in bioinformatics; however, this programming requires more suitable explanations for students with different major backgrounds. In supposing sample sequences and using a simple store system, the connection between the exhaustive search method and the Needleman-Wunsch algorithm…

  19. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts.

    PubMed

    Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N

    2012-09-15

    SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.

  20. A review of bioinformatics training applied to research in molecular medicine, agriculture and biodiversity in Costa Rica and Central America.

    PubMed

    Orozco, Allan; Morera, Jessica; Jiménez, Sergio; Boza, Ricardo

    2013-09-01

    Today, Bioinformatics has become a scientific discipline with great relevance for the Molecular Biosciences and for the Omics sciences in general. Although developed countries have progressed with large strides in Bioinformatics education and research, in other regions, such as Central America, the advances have occurred in a gradual way and with little support from the Academia, either at the undergraduate or graduate level. To address this problem, the University of Costa Rica's Medical School, a regional leader in Bioinformatics in Central America, has been conducting a series of Bioinformatics workshops, seminars and courses, leading to the creation of the region's first Bioinformatics Master's Degree. The recent creation of the Central American Bioinformatics Network (BioCANET), associated to the deployment of a supporting computational infrastructure (HPC Cluster) devoted to provide computing support for Molecular Biology in the region, is providing a foundational stone for the development of Bioinformatics in the area. Central American bioinformaticians have participated in the creation of as well as co-founded the Iberoamerican Bioinformatics Society (SOIBIO). In this article, we review the most recent activities in education and research in Bioinformatics from several regional institutions. These activities have resulted in further advances for Molecular Medicine, Agriculture and Biodiversity research in Costa Rica and the rest of the Central American countries. Finally, we provide summary information on the first Central America Bioinformatics International Congress, as well as the creation of the first Bioinformatics company (Indromics Bioinformatics), spin-off the Academy in Central America and the Caribbean.

  1. Bioinformatics and systems biology research update from the 15th International Conference on Bioinformatics (InCoB2016).

    PubMed

    Schönbach, Christian; Verma, Chandra; Bond, Peter J; Ranganathan, Shoba

    2016-12-22

    The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.

  2. Using Bioinformatics Approach to Explore the Pharmacological Mechanisms of Multiple Ingredients in Shuang-Huang-Lian

    PubMed Central

    Zhang, Bai-xia; Li, Jian; Gu, Hao; Li, Qiang; Zhang, Qi; Zhang, Tian-jiao; Wang, Yun; Cai, Cheng-ke

    2015-01-01

    Due to the proved clinical efficacy, Shuang-Huang-Lian (SHL) has developed a variety of dosage forms. However, the in-depth research on targets and pharmacological mechanisms of SHL preparations was scarce. In the presented study, the bioinformatics approaches were adopted to integrate relevant data and biological information. As a result, a PPI network was built and the common topological parameters were characterized. The results suggested that the PPI network of SHL exhibited a scale-free property and modular architecture. The drug target network of SHL was structured with 21 functional modules. According to certain modules and pharmacological effects distribution, an antitumor effect and potential drug targets were predicted. A biological network which contained 26 subnetworks was constructed to elucidate the antipneumonia mechanism of SHL. We also extracted the subnetwork to explicitly display the pathway where one effective component acts on the pneumonia related targets. In conclusions, a bioinformatics approach was established for exploring the drug targets, pharmacological activity distribution, effective components of SHL, and its mechanism of antipneumonia. Above all, we identified the effective components and disclosed the mechanism of SHL from the view of system. PMID:26495421

  3. Evolution of apoptosis-like programmed cell death in unicellular protozoan parasites.

    PubMed

    Kaczanowski, Szymon; Sajid, Mohammed; Reece, Sarah E

    2011-03-25

    Apoptosis-like programmed cell death (PCD) has recently been described in multiple taxa of unicellular protists, including the protozoan parasites Plasmodium, Trypanosoma and Leishmania. Apoptosis-like PCD in protozoan parasites shares a number of morphological features with programmed cell death in multicellular organisms. However, both the evolutionary explanations and mechanisms involved in parasite PCD are poorly understood. Explaining why unicellular organisms appear to undergo 'suicide' is a challenge for evolutionary biology and uncovering death executors and pathways is a challenge for molecular and cell biology. Bioinformatics has the potential to integrate these approaches by revealing homologies in the PCD machinery of diverse taxa and evaluating their evolutionary trajectories. As the molecular mechanisms of apoptosis in model organisms are well characterised, and recent data suggest similar mechanisms operate in protozoan parasites, key questions can now be addressed. These questions include: which elements of apoptosis machinery appear to be shared between protozoan parasites and multicellular taxa and, have these mechanisms arisen through convergent or divergent evolution? We use bioinformatics to address these questions and our analyses suggest that apoptosis mechanisms in protozoan parasites and other taxa have diverged during their evolution, that some apoptosis factors are shared across taxa whilst others have been replaced by proteins with similar biochemical activities.

  4. Evolution of apoptosis-like programmed cell death in unicellular protozoan parasites

    PubMed Central

    2011-01-01

    Apoptosis-like programmed cell death (PCD) has recently been described in multiple taxa of unicellular protists, including the protozoan parasites Plasmodium, Trypanosoma and Leishmania. Apoptosis-like PCD in protozoan parasites shares a number of morphological features with programmed cell death in multicellular organisms. However, both the evolutionary explanations and mechanisms involved in parasite PCD are poorly understood. Explaining why unicellular organisms appear to undergo 'suicide' is a challenge for evolutionary biology and uncovering death executors and pathways is a challenge for molecular and cell biology. Bioinformatics has the potential to integrate these approaches by revealing homologies in the PCD machinery of diverse taxa and evaluating their evolutionary trajectories. As the molecular mechanisms of apoptosis in model organisms are well characterised, and recent data suggest similar mechanisms operate in protozoan parasites, key questions can now be addressed. These questions include: which elements of apoptosis machinery appear to be shared between protozoan parasites and multicellular taxa and, have these mechanisms arisen through convergent or divergent evolution? We use bioinformatics to address these questions and our analyses suggest that apoptosis mechanisms in protozoan parasites and other taxa have diverged during their evolution, that some apoptosis factors are shared across taxa whilst others have been replaced by proteins with similar biochemical activities. PMID:21439063

  5. Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming.

    PubMed

    Ibarra, Ignacio L; Melo, Francisco

    2010-07-01

    Dynamic programming (DP) is a general optimization strategy that is successfully used across various disciplines of science. In bioinformatics, it is widely applied in calculating the optimal alignment between pairs of protein or DNA sequences. These alignments form the basis of new, verifiable biological hypothesis. Despite its importance, there are no interactive tools available for training and education on understanding the DP algorithm. Here, we introduce an interactive computer application with a graphical interface, for the purpose of educating students about DP. The program displays the DP scoring matrix and the resulting optimal alignment(s), while allowing the user to modify key parameters such as the values in the similarity matrix, the sequence alignment algorithm version and the gap opening/extension penalties. We hope that this software will be useful to teachers and students of bioinformatics courses, as well as researchers who implement the DP algorithm for diverse applications. The software is freely available at: http:/melolab.org/sat. The software is written in the Java computer language, thus it runs on all major platforms and operating systems including Windows, Mac OS X and LINUX. All inquiries or comments about this software should be directed to Francisco Melo at fmelo@bio.puc.cl.

  6. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.

    PubMed

    Jung, Sang-Kyu; McDonald, Karen

    2011-08-16

    Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.

  7. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

    PubMed Central

    2011-01-01

    Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353

  8. Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.

    PubMed

    Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying

    2017-01-01

    Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

    PubMed

    Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

    2004-09-09

    Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  10. Wrapping up BLAST and other applications for use on Unix clusters.

    PubMed

    Hokamp, Karsten; Shields, Denis C; Wolfe, Kenneth H; Caffrey, Daniel R

    2003-02-12

    We have developed two programs that speed up common bioinformatic applications by spreading them across a UNIX cluster.(1) BLAST.pm, a new module for the 'MOLLUSC' package. (2) WRAPID, a simple tool for parallelizing large numbers of small instances of programs such as BLAST, FASTA and CLUSTALW. The packages were developed in Perl on a 20-node Linux cluster and are provided together with a configuration script and documentation. They can be freely downloaded from http://wolfe.gen.tcd.ie/wrapper.

  11. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.

    PubMed

    Wightman, Bruce; Hark, Amy T

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.

  12. ReGaTE: Registration of Galaxy Tools in Elixir

    PubMed Central

    Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

    2017-01-01

    Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE. PMID:28402416

  13. BioXSD: the common data-exchange format for everyday bioinformatics web services

    PubMed Central

    Kalaš, Matúš; Puntervoll, Pæl; Joseph, Alexandre; Bartaševičiūtė, Edita; Töpfer, Armin; Venkataraman, Prabakar; Pettifer, Steve; Bryne, Jan Christian; Ison, Jon; Blanchet, Christophe; Rapacki, Kristoffer; Jonassen, Inge

    2010-01-01

    Motivation: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact: matus.kalas@bccs.uib.no; developers@bioxsd.org; support@bioxsd.org PMID:20823319

  14. Computer Simulation of Embryonic Systems: What can a virtual embryo teach us about developmental toxicity? (LA Conference on Computational Biology & Bioinformatics)

    EPA Science Inventory

    This presentation will cover work at EPA under the CSS program for: (1) Virtual Tissue Models built from the known biology of an embryological system and structured to recapitulate key cell signals and responses; (2) running the models with real (in vitro) or synthetic (in silico...

  15. An Innovative Plant Genomics and Gene Annotation Program for High School, Community College, and University Faculty

    ERIC Educational Resources Information Center

    Hacisalihoglu, Gokhan; Hilgert, Uwe; Nash, E. Bruce; Micklos, David A.

    2008-01-01

    Today's biology educators face the challenge of training their students in modern molecular biology techniques including genomics and bioinformatics. The Dolan DNA Learning Center (DNALC) of Cold Spring Harbor Laboratory has developed and disseminated a bench- and computer-based plant genomics curriculum for biology faculty. In 2007, a five-day…

  16. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics.

    PubMed

    Zhou, Zhiwei; Xiong, Xin; Zhu, Zheng-Jiang

    2017-07-15

    In metabolomics, rigorous structural identification of metabolites presents a challenge for bioinformatics. The use of collision cross-section (CCS) values of metabolites derived from ion mobility-mass spectrometry effectively increases the confidence of metabolite identification, but this technique suffers from the limit number of available CCS values. Currently, there is no software available for rapidly generating the metabolites' CCS values. Here, we developed the first web server, namely, MetCCS Predictor, for predicting CCS values. It can predict the CCS values of metabolites using molecular descriptors within a few seconds. Common users with limited background on bioinformatics can benefit from this software and effectively improve the metabolite identification in metabolomics. The web server is freely available at: http://www.metabolomics-shanghai.org/MetCCS/ . jiangzhu@sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  18. Computational thinking in life science education.

    PubMed

    Rubinstein, Amir; Chor, Benny

    2014-11-01

    We join the increasing call to take computational education of life science students a step further, beyond teaching mere programming and employing existing software tools. We describe a new course, focusing on enriching the curriculum of life science students with abstract, algorithmic, and logical thinking, and exposing them to the computational "culture." The design, structure, and content of our course are influenced by recent efforts in this area, collaborations with life scientists, and our own instructional experience. Specifically, we suggest that an effective course of this nature should: (1) devote time to explicitly reflect upon computational thinking processes, resisting the temptation to drift to purely practical instruction, (2) focus on discrete notions, rather than on continuous ones, and (3) have basic programming as a prerequisite, so students need not be preoccupied with elementary programming issues. We strongly recommend that the mere use of existing bioinformatics tools and packages should not replace hands-on programming. Yet, we suggest that programming will mostly serve as a means to practice computational thinking processes. This paper deals with the challenges and considerations of such computational education for life science students. It also describes a concrete implementation of the course and encourages its use by others.

  19. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts

    PubMed Central

    Ryan, Michael C.; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N.

    2012-01-01

    Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. Availability and implementation: SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. Contact: mryan@insilico.us.com Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22820202

  20. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  1. Assemble worldwide biologists in a network construct a web services based architecture for bioinformatics.

    PubMed

    Tao, Yuan; Liu, Juan

    2005-01-01

    The Internet has already deflated our world of working and living into a very small scope, thus bringing out the concept of Earth Village, in which people could communicate and co-work though thousands' miles far away from each other. This paper describes a prototype, which is just like an Earth Lab for bioinformatics, based on Web services framework to build up a network architecture for bioinformatics research and for world wide biologists to easily implement enormous, complex processes, and effectively share and access computing resources and data, regardless of how heterogeneous the format of the data is and how decentralized and distributed these resources are around the world. A diminutive and simplified example scenario is given out to realize the prototype after that.

  2. Bioinformatics research in the Asia Pacific: a 2007 update.

    PubMed

    Ranganathan, Shoba; Gribskov, Michael; Tan, Tin Wee

    2008-01-01

    We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27-30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.

  3. Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation.

    PubMed

    Usmanova, Dinara R; Bogatyreva, Natalya S; Ariño Bernad, Joan; Eremina, Aleksandra A; Gorshkova, Anastasiya A; Kanevskiy, German M; Lonishin, Lyubov R; Meister, Alexander V; Yakupova, Alisa G; Kondrashov, Fyodor A; Ivankov, Dmitry N

    2018-05-02

    Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta, and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. ivankov13@gmail.com. Supplementary data are available at Bioinformatics online.

  4. Bioinformatics Education—Perspectives and Challenges out of Africa

    PubMed Central

    Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.

    2015-01-01

    The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350

  5. Emerging strengths in Asia Pacific bioinformatics.

    PubMed

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-12-12

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20-23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.

  6. Emerging strengths in Asia Pacific bioinformatics

    PubMed Central

    Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee

    2008-01-01

    The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008

  7. Extending Asia Pacific bioinformatics into new realms in the "-omics" era.

    PubMed

    Ranganathan, Shoba; Eisenhaber, Frank; Tong, Joo Chuan; Tan, Tin Wee

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.

  8. Opportunities and challenges provided by cloud repositories for bioinformatics-enabled drug discovery.

    PubMed

    Dalpé, Gratien; Joly, Yann

    2014-09-01

    Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.

  9. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.

    PubMed

    Yao, Li; Wang, Heming; Song, Yuanyuan; Sui, Guangchao

    2017-10-15

    With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease. BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io. li_yao@outlook.com or gcsui@nefu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. eSBMTools 1.0: enhanced native structure-based modeling tools.

    PubMed

    Lutz, Benjamin; Sinner, Claude; Heuermann, Geertje; Verma, Abhinav; Schug, Alexander

    2013-11-01

    Molecular dynamics simulations provide detailed insights into the structure and function of biomolecular systems. Thus, they complement experimental measurements by giving access to experimentally inaccessible regimes. Among the different molecular dynamics techniques, native structure-based models (SBMs) are based on energy landscape theory and the principle of minimal frustration. Typically used in protein and RNA folding simulations, they coarse-grain the biomolecular system and/or simplify the Hamiltonian resulting in modest computational requirements while achieving high agreement with experimental data. eSBMTools streamlines running and evaluating SBM in a comprehensive package and offers high flexibility in adding experimental- or bioinformatics-derived restraints. We present a software package that allows setting up, modifying and evaluating SBM for both RNA and proteins. The implemented workflows include predicting protein complexes based on bioinformatics-derived inter-protein contact information, a standardized setup of protein folding simulations based on the common PDB format, calculating reaction coordinates and evaluating the simulation by free-energy calculations with weighted histogram analysis method or by phi-values. The modules interface with the molecular dynamics simulation program GROMACS. The package is open source and written in architecture-independent Python2. http://sourceforge.net/projects/esbmtools/. alexander.schug@kit.edu. Supplementary data are available at Bioinformatics online.

  11. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

    PubMed

    Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta

    2018-04-15

    Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.

  12. Big data for big questions: it is time for data analysts to act

    PubMed Central

    Moscato, Pablo

    2015-01-01

    Pablo Moscato speaks to Francesca Lake, Managing Editor Australian Research Council Future Fellow Prof. Pablo Moscato was born in 1964 in La Plata, Argentina. Obtaining his B.Sc. in Physics at University of La Plata, his PhD was defended at UNICAMP, Brazil. While at the California Institute of Technology Concurrent Computation Program he developed, in collaboration with Michael Norman, the first application of a methodology later called ‘memetic algorithms’, which is now widely used internationally. He is the founding co-director of the Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine (CIBM) (2006–present) and the funding director of the Newcastle Bioinformatics Initiative (2002–2006) of The University of Newcastle (Australia). He is also Chief Investigator of the Australian Research Council Centre in Bioinformatics. He is one of Australia's most cited computer scientists. Over the past 7 years, he has introduced a unifying hallmark of cancer progression based on the changes of information theory quantifiers, and developed a novel mathematical model and an associated solution procedure based on combinatorial optimization techniques to identify drug combinations for cancer therapeutics. In addition, he has identified proteomic signatures to predict the clinical symptoms of Alzheimer's disease, among other ‘firsts’. He is a member of the Editorial Board of Future Science OA. PMID:28031895

  13. An architecture for genomics analysis in a clinical setting using Galaxy and Docker

    PubMed Central

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A

    2017-01-01

    Abstract Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. PMID:29048555

  14. An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

    PubMed

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A; Rance, B

    2017-11-01

    Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. © The Author 2017. Published by Oxford University Press.

  15. MicroRNAs as Potential Regulators of Glutathione Peroxidases Expression and Their Role in Obesity and Related Pathologies.

    PubMed

    Matoušková, Petra; Hanousková, Barbora; Skálová, Lenka

    2018-04-14

    Glutathione peroxidases (GPxs) belong to the eight-member family of phylogenetically related enzymes with different cellular localization, but distinct antioxidant function. Several GPxs are important selenoproteins. Dysregulated GPx expression is connected with severe pathologies, including obesity and diabetes. We performed a comprehensive bioinformatic analysis using the programs miRDB, miRanda, TargetScan, and Diana in the search for hypothetical microRNAs targeting 3'untranslated regions (3´UTR) of GPxs. We cross-referenced the literature for possible intersections between our results and available reports on identified microRNAs, with a special focus on the microRNAs related to oxidative stress, obesity, and related pathologies. We identified many microRNAs with an association with oxidative stress and obesity as putative regulators of GPxs. In particular, miR-185-5p was predicted by a larger number of programs to target six GPxs and thus could play the role as their master regulator. This microRNA was altered by selenium deficiency and can play a role as a feedback control of selenoproteins' expression. Through the bioinformatics analysis we revealed the potential connection of microRNAs, GPxs, obesity, and other redox imbalance related diseases.

  16. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.

    PubMed

    McDonald, Daniel; Clemente, Jose C; Kuczynski, Justin; Rideout, Jai Ram; Stombaugh, Jesse; Wendel, Doug; Wilke, Andreas; Huse, Susan; Hufnagle, John; Meyer, Folker; Knight, Rob; Caporaso, J Gregory

    2012-07-12

    We present the Biological Observation Matrix (BIOM, pronounced "biome") format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata. As the number of categories of comparative omics data types (collectively, the "ome-ome") grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses. The BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages. The BIOM file format and the biom-format project are steps toward reducing the "bioinformatics bottleneck" that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium.

  17. Edge Bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  18. Bioinformatics goes back to the future.

    PubMed

    Miller, Crispin J; Attwood, Teresa K

    2003-02-01

    The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.

  19. Introductory Bioinformatics Exercises Utilizing Hemoglobin and Chymotrypsin to Reinforce the Protein Sequence-Structure-Function Relationship

    ERIC Educational Resources Information Center

    Inlow, Jennifer K.; Miller, Paige; Pittman, Bethany

    2007-01-01

    We describe two bioinformatics exercises intended for use in a computer laboratory setting in an upper-level undergraduate biochemistry course. To introduce students to bioinformatics, the exercises incorporate several commonly used bioinformatics tools, including BLAST, that are freely available online. The exercises build upon the students'…

  20. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  1. Vertical and Horizontal Integration of Bioinformatics Education: A Modular, Interdisciplinary Approach

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Stevens-Truss, Regina; Moore, D. Blaine; Langeland, James A.

    2009-01-01

    Bioinformatics education for undergraduates has been approached primarily in two ways: introduction of new courses with largely bioinformatics focus or introduction of bioinformatics experiences into existing courses. For small colleges such as Kalamazoo, creation of new courses within an already resource-stretched setting has not been an option.…

  2. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  3. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  4. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  5. Bioinformatics core competencies for undergraduate life sciences education.

    PubMed

    Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.

  6. Bioinformatics core competencies for undergraduate life sciences education

    PubMed Central

    Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542

  7. MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

    PubMed

    Ardin, Maude; Cahais, Vincent; Castells, Xavier; Bouaoun, Liacine; Byrnes, Graham; Herceg, Zdenko; Zavadil, Jiri; Olivier, Magali

    2016-04-18

    The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec. MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

  8. Genomes2Drugs: Identifies Target Proteins and Lead Drugs from Proteome Data

    PubMed Central

    Toomey, David; Hoppe, Heinrich C.; Brennan, Marian P.; Nolan, Kevin B.; Chubb, Anthony J.

    2009-01-01

    Background Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. Methodology/Principal Findings To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. Conclusions/Significance Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents. PMID:19593435

  9. ReGaTE: Registration of Galaxy Tools in Elixir.

    PubMed

    Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

    2017-06-01

    Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE . © The Author 2017. Published by Oxford University Press.

  10. Versatile and declarative dynamic programming using pair algebras.

    PubMed

    Steffen, Peter; Giegerich, Robert

    2005-09-12

    Dynamic programming is a widely used programming technique in bioinformatics. In sharp contrast to the simplicity of textbook examples, implementing a dynamic programming algorithm for a novel and non-trivial application is a tedious and error prone task. The algebraic dynamic programming approach seeks to alleviate this situation by clearly separating the dynamic programming recurrences and scoring schemes. Based on this programming style, we introduce a generic product operation of scoring schemes. This leads to a remarkable variety of applications, allowing us to achieve optimizations under multiple objective functions, alternative solutions and backtracing, holistic search space analysis, ambiguity checking, and more, without additional programming effort. We demonstrate the method on several applications for RNA secondary structure prediction. The product operation as introduced here adds a significant amount of flexibility to dynamic programming. It provides a versatile testbed for the development of new algorithmic ideas, which can immediately be put to practice.

  11. Report on the EMBER Project--A European Multimedia Bioinformatics Educational Resource

    ERIC Educational Resources Information Center

    Attwood, Terri K.; Selimas, Ioannis; Buis, Rob; Altenburg, Ruud; Herzog, Robert; Ledent, Valerie; Ghita, Viorica; Fernandes, Pedro; Marques, Isabel; Brugman, Marc

    2005-01-01

    EMBER was a European project aiming to develop bioinformatics teaching materials on the Web and CD-ROM to help address the recognised skills shortage in bioinformatics. The project grew out of pilot work on the development of an interactive web-based bioinformatics tutorial and the desire to repackage that resource with the help of a professional…

  12. The 2017 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest. PMID:29118973

  13. The 2017 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Munoz-Torres, Monica; Tzovaras, Bastian Greshake; Wiencko, Heather

    2017-01-01

    The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year's theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.

  14. Rising Strengths Hong Kong SAR in Bioinformatics.

    PubMed

    Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy

    2017-06-01

    Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.

  15. Cell growth inhibition and apoptosis in breast cancer cells induced by anti-FZD7 scFvs: involvement of bioinformatics-based design of novel epitopes.

    PubMed

    Zarei, Neda; Fazeli, Mehdi; Mohammadi, Mozafar; Nejatollahi, Foroogh

    2018-06-01

    FZD7 has a critical role as a surface receptor of Wnt/β-catenin signaling in cancer cells. Suppressing Wnt signaling through blocking FZD7 is shown to decrease cell viability, metastasis and invasion. Bioinformatic methods have been a powerful tool in epitope designing studies. Small size, high affinity and human origin of scFv antibodies have provided unique advantages for these recombinant antibodies. Two epitopes from extracellular domain of FZD7 were designed using bioinformatic methods. Specific anti-FZD7 scFvs were selected against these epitopes through panning process. The specificity of the scFvs was assessed by phage ELISA and the ability to bind to FZD7 expressing cell line (MDA-MB-231) was determined by flowcytometry. Antiproliferative and apoptotic effects of the scFvs were evaluated by MTT and Annexin V/PI assays. The effects of selected scFvs on expression level of Surivin, c-Myc and Dvl genes were also evaluated by real-time PCR. Results demonstrated selection of two specific scFvs (scFv-I and scFv-II) with frequencies of 35 and 20%. Both antibodies bound to the corresponding peptides and cell surface receptors as shown by phage ELISA and flowcytometry, respectively. The scFvs inhibited cell growth of MDA-MB-231 cells significantly as compared to untreated cells. Growth inhibition of 58.6 and 53.1% were detected for scFv-I and scFv-II, respectively. No significant growth inhibition was detected for SKBR-3 negative control cells. The scFvs induced apoptotic effects in the MDA-MB-231 treated cells after 48 h, which were 81.6 and 74.9% for scFv-I and scFv-II, respectively. Downregulation of Surivin, c-Myc and Dvl genes were also shown after 48h treatment of cells with either of scFvs (59.3-93.8%). ScFv-I showed significant higher antiproliferative and apoptotic effects than scFv-II. Bioinformatic methods could effectively select potential epitopes of FZD7 protein and suggest that epitope designing by bioinformatic methods could contribute to the selection of key antigens for cancer immunotherapy. The selected scFvs, especially scFv-I, with high antiproliferative and apoptotic effects could be considered as effective agents for immunotherapy of cancers expressing FZD7 receptor including triple negative breast cancer.

  16. An Introduction to Programming for Bioscientists: A Python-Based Primer

    PubMed Central

    Mura, Cameron

    2016-01-01

    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language’s usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a “variable,” the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences. PMID:27271528

  17. An Introduction to Programming for Bioscientists: A Python-Based Primer.

    PubMed

    Ekmekci, Berk; McAnany, Charles E; Mura, Cameron

    2016-06-01

    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a "variable," the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.

  18. Bringing computational science to the public.

    PubMed

    McDonagh, James L; Barker, Daniel; Alderson, Rosanna G

    2016-01-01

    The increasing use of computers in science allows for the scientific analyses of large datasets at an increasing pace. We provided examples and interactive demonstrations at Dundee Science Centre as part of the 2015 Women in Science festival, to present aspects of computational science to the general public. We used low-cost Raspberry Pi computers to provide hands on experience in computer programming and demonstrated the application of computers to biology. Computer games were used as a means to introduce computers to younger visitors. The success of the event was evaluated by voluntary feedback forms completed by visitors, in conjunction with our own self-evaluation. This work builds on the original work of the 4273π bioinformatics education program of Barker et al. (2013, BMC Bioinform. 14:243). 4273π provides open source education materials in bioinformatics. This work looks at the potential to adapt similar materials for public engagement events. It appears, at least in our small sample of visitors (n = 13), that basic computational science can be conveyed to people of all ages by means of interactive demonstrations. Children as young as five were able to successfully edit simple computer programs with supervision. This was, in many cases, their first experience of computer programming. The feedback is predominantly positive, showing strong support for improving computational science education, but also included suggestions for improvement. Our conclusions are necessarily preliminary. However, feedback forms suggest methods were generally well received among the participants; "Easy to follow. Clear explanation" and "Very easy. Demonstrators were very informative." Our event, held at a local Science Centre in Dundee, demonstrates that computer games and programming activities suitable for young children can be performed alongside a more specialised and applied introduction to computational science for older visitors.

  19. Better bioinformatics through usability analysis.

    PubMed

    Bolchini, Davide; Finkelstein, Anthony; Perrone, Vito; Nagl, Sylvia

    2009-02-01

    Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.

  20. Bioinformatics education dissemination with an evolutionary problem solving perspective.

    PubMed

    Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J

    2010-11-01

    Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.

  1. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  2. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  3. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  4. The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics.

    PubMed

    de Matos, Paula; Cham, Jennifer A; Cao, Hong; Alcántara, Rafael; Rowland, Francis; Lopez, Rodrigo; Steinbeck, Christoph

    2013-03-20

    User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users' requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios.For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature.We employed several UCD techniques, including: persona development, interviews, 'canvas sort' card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience.

  5. Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology

    PubMed Central

    Wood, Louisa; Gebhardt, Philipp

    2013-01-01

    Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266

  6. The 2016 Bioinformatics Open Source Conference (BOSC).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Chapman, Brad; Fields, Christopher J; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science.

  7. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  8. A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences

    DTIC Science & Technology

    2009-02-23

    AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings

  9. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were analyzed in relation to their experience in learning a bioinformatics mini-unit. There were distinct individual differences among the participants, especially in the way they processed information and integrated procedural and analytical thought during bioinformatics learning. These differences may provide insights into some of the specific needs of students that educators and curriculum designers should consider when designing bioinformatics learning experiences. Implications for teacher education and curriculum design are presented in addition to some suggestions for further research.

  10. Learning nucleic acids solving by bioinformatics problems.

    PubMed

    Nunes, Rhewter; Barbosa de Almeida Júnior, Edivaldo; Pessoa Pinto de Menezes, Ivandilson; Malafaia, Guilherme

    2015-01-01

    The article describes the development of a new approach to teach molecular biology to undergraduate biology students. The 34 students who participated in this research belonged to the first period of the Biological Sciences teaching course of the Instituto Federal Goiano at Urutaí Campus, Brazil. They were registered in Cell Biology in the first semester of 2013. They received four 55 min-long expository/dialogued lectures that covered the content of "structure and functions of nucleic acids". Later the students were invited to attend four meetings (in a computer laboratory) in which some concepts of Bioinformatics were presented and some problems of the Rosalind platform were solved. The observations we report here are very useful as a broad groundwork to development new research. An interesting possibility is research into the effects of bioinformatics interventions that improve molecular biology learning. © 2015 The International Union of Biochemistry and Molecular Biology.

  11. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.

    PubMed

    Davis, Sean; Meltzer, Paul S

    2007-07-15

    Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics (NCBI) at the National Institutes of Health is a repository of nearly 140,000 gene expression experiments. The BioConductor project (Gentleman et al., 2004) is an open-source and open-development software project built in the R statistical programming environment (R Development core Team, 2005) for the analysis and comprehension of genomic data. The tools contained in the BioConductor project represent many state-of-the-art methods for the analysis of microarray and genomics data. We have developed a software tool that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past. The software, called GEOquery, effectively establishes a bridge between GEO and BioConductor. Easy access to GEO data from BioConductor will likely lead to new analyses of GEO data using novel and rigorous statistical and bioinformatic tools. Facilitating analyses and meta-analyses of microarray data will increase the efficiency with which biologically important conclusions can be drawn from published genomic data. GEOquery is available as part of the BioConductor project.

  12. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  13. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  14. A bioinformatics potpourri.

    PubMed

    Schönbach, Christian; Li, Jinyan; Ma, Lan; Horton, Paul; Sjaugi, Muhammad Farhan; Ranganathan, Shoba

    2018-01-19

    The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.

  15. Big Data Bioinformatics

    PubMed Central

    GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO

    2017-01-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:27908398

  16. Big Data Bioinformatics

    PubMed Central

    GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO

    2017-01-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:24799088

  17. Big data bioinformatics.

    PubMed

    Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao

    2014-12-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. © 2014 Wiley Periodicals, Inc.

  18. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  19. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs

    PubMed Central

    2012-01-01

    Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China. PMID:23281929

  20. Genetics of PCOS: A systematic bioinformatics approach to unveil the proteins responsible for PCOS.

    PubMed

    Panda, Pritam Kumar; Rane, Riya; Ravichandran, Rahul; Singh, Shrinkhla; Panchal, Hetalkumar

    2016-06-01

    Polycystic ovary syndrome (PCOS) is a hormonal imbalance in women, which causes problems during menstrual cycle and in pregnancy that sometimes results in fatality. Though the genetics of PCOS is not fully understood, early diagnosis and treatment can prevent long-term effects. In this study, we have studied the proteins involved in PCOS and the structural aspects of the proteins that are taken into consideration using computational tools. The proteins involved are modeled using Modeller 9v14 and Ab-initio programs. All the 43 proteins responsible for PCOS were subjected to phylogenetic analysis to identify the relatedness of the proteins. Further, microarray data analysis of PCOS datasets was analyzed that was downloaded from GEO datasets to find the significant protein-coding genes responsible for PCOS, which is an addition to the reported protein-coding genes. Various statistical analyses were done using R programming to get an insight into the structural aspects of PCOS that can be used as drug targets to treat PCOS and other related reproductive diseases.

  1. TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information

    PubMed Central

    Struck, Torsten H

    2014-01-01

    Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118

  2. Translational bioinformatics: linking the molecular world to the clinical world.

    PubMed

    Altman, R B

    2012-06-01

    Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.

  3. Integrating grant-funded research into the undergraduate biology curriculum using IMG-ACT.

    PubMed

    Ditty, Jayna L; Williams, Kayla M; Keller, Megan M; Chen, Grischa Y; Liu, Xianxian; Parales, Rebecca E

    2013-01-01

    It has become clear in current scientific pedagogy that the emersion of students in the scientific process in terms of designing, implementing, and analyzing experiments is imperative for their education; as such, it has been our goal to model this active learning process in the classroom and laboratory in the context of a genuine scientific question. Toward this objective, the National Science Foundation funded a collaborative research grant between a primarily undergraduate institution and a research-intensive institution to study the chemotactic responses of the bacterium Pseudomonas putida F1. As part of the project, a new Bioinformatics course was developed in which undergraduates annotate relevant regions of the P. putida F1 genome using Integrated Microbial Genomes Annotation Collaboration Toolkit, a bioinformatics interface specifically developed for undergraduate programs by the Department of Energy Joint Genome Institute. Based on annotations of putative chemotaxis genes in P. putida F1 and comparative genomics studies, undergraduate students from both institutions developed functional genomics research projects that evolved from the annotations. The purpose of this study is to describe the nature of the NSF grant, the development of the Bioinformatics lecture and wet laboratory course, and how undergraduate student involvement in the project that was initiated in the classroom has served as a springboard for independent undergraduate research projects. Copyright © 2012 International Union of Biochemistry and Molecular Biology, Inc.

  4. Deep learning in bioinformatics.

    PubMed

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2017-09-01

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  6. PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols.

    PubMed

    Kanterakis, Alexandros; Kuiper, Joël; Potamias, George; Swertz, Morris A

    2015-01-01

    Today researchers can choose from many bioinformatics protocols for all types of life sciences research, computational environments and coding languages. Although the majority of these are open source, few of them possess all virtues to maximize reuse and promote reproducible science. Wikipedia has proven a great tool to disseminate information and enhance collaboration between users with varying expertise and background to author qualitative content via crowdsourcing. However, it remains an open question whether the wiki paradigm can be applied to bioinformatics protocols. We piloted PyPedia, a wiki where each article is both implementation and documentation of a bioinformatics computational protocol in the python language. Hyperlinks within the wiki can be used to compose complex workflows and induce reuse. A RESTful API enables code execution outside the wiki. Initial content of PyPedia contains articles for population statistics, bioinformatics format conversions and genotype imputation. Use of the easy to learn wiki syntax effectively lowers the barriers to bring expert programmers and less computer savvy researchers on the same page. PyPedia demonstrates how wiki can provide a collaborative development, sharing and even execution environment for biologists and bioinformaticians that complement existing resources, useful for local and multi-center research teams. PyPedia is available online at: http://www.pypedia.com. The source code and installation instructions are available at: https://github.com/kantale/PyPedia_server. The PyPedia python library is available at: https://github.com/kantale/pypedia. PyPedia is open-source, available under the BSD 2-Clause License.

  7. 4273π: Bioinformatics education on low cost ARM hardware

    PubMed Central

    2013-01-01

    Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194

  8. 4273π: bioinformatics education on low cost ARM hardware.

    PubMed

    Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

    2013-08-12

    Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

  9. A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012.

    PubMed

    Brazas, Michelle D; Yim, David; Yeung, Winston; Ouellette, B F Francis

    2012-07-01

    The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.

  10. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome

    PubMed Central

    2012-01-01

    Background We present the Biological Observation Matrix (BIOM, pronounced “biome”) format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata. As the number of categories of comparative omics data types (collectively, the “ome-ome”) grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses. Findings The BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages. Conclusions The BIOM file format and the biom-format project are steps toward reducing the “bioinformatics bottleneck” that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium. PMID:23587224

  11. MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.

    PubMed

    Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco

    2016-04-01

    Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

    PubMed

    Oulas, Anastasis; Minadakis, George; Zachariou, Margarita; Sokratous, Kleitos; Bourdakou, Marilena M; Spyrou, George M

    2017-11-27

    Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine. © The Author 2017. Published by Oxford University Press.

  14. Interdisciplinary Introductory Course in Bioinformatics

    ERIC Educational Resources Information Center

    Kortsarts, Yana; Morris, Robert W.; Utell, Janine M.

    2010-01-01

    Bioinformatics is a relatively new interdisciplinary field that integrates computer science, mathematics, biology, and information technology to manage, analyze, and understand biological, biochemical and biophysical information. We present our experience in teaching an interdisciplinary course, Introduction to Bioinformatics, which was developed…

  15. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  16. Bioinformatics-driven discovery of rational combination for overcoming EGFR-mutant lung cancer resistance to EGFR therapy.

    PubMed

    Kim, Jihye; Vasu, Vihas T; Mishra, Rangnath; Singleton, Katherine R; Yoo, Minjae; Leach, Sonia M; Farias-Hesson, Eveline; Mason, Robert J; Kang, Jaewoo; Ramamoorthy, Preveen; Kern, Jeffrey A; Heasley, Lynn E; Finigan, James H; Tan, Aik Choon

    2014-09-01

    Non-small-cell lung cancer (NSCLC) is the leading cause of cancer death in the United States. Targeted tyrosine kinase inhibitors (TKIs) directed against the epidermal growth factor receptor (EGFR) have been widely and successfully used in treating NSCLC patients with activating EGFR mutations. Unfortunately, the duration of response is short-lived, and all patients eventually relapse by acquiring resistance mechanisms. We performed an integrative systems biology approach to determine essential kinases that drive EGFR-TKI resistance in cancer cell lines. We used a series of bioinformatics methods to analyze and integrate the functional genetics screen and RNA-seq data to identify a set of kinases that are critical in survival and proliferation in these TKI-resistant lines. By connecting the essential kinases to compounds using a novel kinase connectivity map (K-Map), we identified and validated bosutinib as an effective compound that could inhibit proliferation and induce apoptosis in TKI-resistant lines. A rational combination of bosutinib and gefitinib showed additive and synergistic effects in cancer cell lines resistant to EGFR TKI alone. We have demonstrated a bioinformatics-driven discovery roadmap for drug repurposing and development in overcoming resistance in EGFR-mutant NSCLC, which could be generalized to other cancer types in the era of personalized medicine. K-Map can be accessible at: http://tanlab.ucdenver.edu/kMap. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics

    PubMed Central

    2013-01-01

    User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users’ requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios. For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature. We employed several UCD techniques, including: persona development, interviews, ‘canvas sort’ card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience. PMID:23514033

  18. Chapter 16: text mining for translational bioinformatics.

    PubMed

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  19. Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

    PubMed Central

    Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

    2017-01-01

    Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161

  20. Precision medicine needs pioneering clinical bioinformaticians.

    PubMed

    Gómez-López, Gonzalo; Dopazo, Joaquín; Cigudosa, Juan C; Valencia, Alfonso; Al-Shahrour, Fátima

    2017-10-25

    Success in precision medicine depends on accessing high-quality genetic and molecular data from large, well-annotated patient cohorts that couple biological samples to comprehensive clinical data, which in conjunction can lead to effective therapies. From such a scenario emerges the need for a new professional profile, an expert bioinformatician with training in clinical areas who can make sense of multi-omics data to improve therapeutic interventions in patients, and the design of optimized basket trials. In this review, we first describe the main policies and international initiatives that focus on precision medicine. Secondly, we review the currently ongoing clinical trials in precision medicine, introducing the concept of 'precision bioinformatics', and we describe current pioneering bioinformatics efforts aimed at implementing tools and computational infrastructures for precision medicine in health institutions around the world. Thirdly, we discuss the challenges related to the clinical training of bioinformaticians, and the urgent need for computational specialists capable of assimilating medical terminologies and protocols to address real clinical questions. We also propose some skills required to carry out common tasks in clinical bioinformatics and some tips for emergent groups. Finally, we explore the future perspectives and the challenges faced by precision medicine bioinformatics. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  2. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  3. Phosphoproteomics and Bioinformatics Analyses of Spinal Cord Proteins in Rats with Morphine Tolerance

    PubMed Central

    Liaw, Wen-Jinn; Tsao, Cheng-Ming; Huang, Go-Shine; Wu, Chin-Chen; Ho, Shung-Tai; Wang, Jhi-Joung; Tao, Yuan-Xiang; Shui, Hao-Ai

    2014-01-01

    Introduction Morphine is the most effective pain-relieving drug, but it can cause unwanted side effects. Direct neuraxial administration of morphine to spinal cord not only can provide effective, reliable pain relief but also can prevent the development of supraspinal side effects. However, repeated neuraxial administration of morphine may still lead to morphine tolerance. Methods To better understand the mechanism that causes morphine tolerance, we induced tolerance in rats at the spinal cord level by giving them twice-daily injections of morphine (20 µg/10 µL) for 4 days. We confirmed tolerance by measuring paw withdrawal latencies and maximal possible analgesic effect of morphine on day 5. We then carried out phosphoproteomic analysis to investigate the global phosphorylation of spinal proteins associated with morphine tolerance. Finally, pull-down assays were used to identify phosphorylated types and sites of 14-3-3 proteins, and bioinformatics was applied to predict biological networks impacted by the morphine-regulated proteins. Results Our proteomics data showed that repeated morphine treatment altered phosphorylation of 10 proteins in the spinal cord. Pull-down assays identified 2 serine/threonine phosphorylated sites in 14-3-3 proteins. Bioinformatics further revealed that morphine impacted on cytoskeletal reorganization, neuroplasticity, protein folding and modulation, signal transduction and biomolecular metabolism. Conclusions Repeated morphine administration may affect multiple biological networks by altering protein phosphorylation. These data may provide insight into the mechanism that underlies the development of morphine tolerance. PMID:24392096

  4. Influenza research database: an integrated bioinformatics resource for influenza virus research

    USDA-ARS?s Scientific Manuscript database

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...

  5. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…

  6. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  7. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  8. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

    PubMed

    El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

    2012-01-01

    Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

  9. Bioinformatics-based tools in drug discovery: the cartography from single gene to integrative biological networks.

    PubMed

    Ramharack, Pritika; Soliman, Mahmoud E S

    2018-06-01

    Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  11. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community

    PubMed Central

    D'Elia, Domenica; Gisel, Andreas; Eriksson, Nils-Einar; Kossida, Sophia; Mattila, Kimmo; Klucar, Lubos; Bongcam-Rudloff, Erik

    2009-01-01

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in. PMID:19534734

  12. Scaffold Attachment Factor B1: A Novel Chromatin Regulator of Prostate Cancer Metabolism

    DTIC Science & Technology

    2016-10-01

    INVESTIGATOR: SUNGYONG YOU PhD CONTRACTING ORGANIZATION: Cedars-Sinai Medical Center Los Angeles, CA 90048 REPORT DATE: October 2016 TYPE OF REPORT...NUMBER CEDARS-SINAI MEDICAL CENTER 8700 BEVERLY BLVD LOS ANGELES CA 90048-1804 9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR...Bioinformatics. The Urologic Oncology Program, held in Cedars- Sinai Medical Center, Los Angeles, California, March 10, 2015. Oral Presentation: 1. You S

  13. Greater Philadelphia Bioinformatics Alliance (GPBA) 3rd Annual Retreat 2005

    DTIC Science & Technology

    2005-11-01

    Using Probabilistic Network Reliability. Genome Res. 14:1170-1175 [27] Batagelj , V. and Mrvar , A. (1998). Pajek: Program for large network analysis...and Neural Computation, Division of Informatics, Centre for Cognitive Science, University of Edinburgh, Scotland, April 1996 . www.anc.ed.ac.uk/-mjo...research center in the College of Engineering, and one of the foremost academic research centers in its field. From 1996 to 1998 he was the Founding

  14. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  15. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  16. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  17. Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.

    PubMed

    Banwait, Jasjit K; Bastola, Dhundy R

    2015-01-01

    Despite enormous efforts, cancer remains one of the most lethal diseases in the world. With the advancement of high throughput technologies massive amounts of cancer data can be accessed and analyzed. Bioinformatics provides a platform to assist biologists in developing minimally invasive biomarkers to detect cancer, and in designing effective personalized therapies to treat cancer patients. Still, the early diagnosis, prognosis, and treatment of cancer are an open challenge for the research community. MicroRNAs (miRNAs) are small non-coding RNAs that serve to regulate gene expression. The discovery of deregulated miRNAs in cancer cells and tissues has led many to investigate the use of miRNAs as potential biomarkers for early detection, and as a therapeutic agent to treat cancer. Here we describe advancements in computational approaches to predict miRNAs and their targets, and discuss the role of bioinformatics in studying miRNAs in the context of human cancer. Published by Elsevier B.V.

  18. Treetrimmer: a method for phylogenetic dataset size reduction.

    PubMed

    Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M

    2013-04-12

    With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

  19. Bioinformatics in protein kinases regulatory network and drug discovery.

    PubMed

    Chen, Qingfeng; Luo, Haiqiong; Zhang, Chengqi; Chen, Yi-Ping Phoebe

    2015-04-01

    Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Role of remote sensing, geographical information system (GIS) and bioinformatics in kala-azar epidemiology

    PubMed Central

    Bhunia, Gouri Sankar; Dikhit, Manas Ranjan; Kesari, Shreekant; Sahoo, Ganesh Chandra; Das, Pradeep

    2011-01-01

    Visceral leishmaniasis or kala-azar is a potent parasitic infection causing death of thousands of people each year. Medicinal compounds currently available for the treatment of kala-azar have serious side effects and decreased efficacy owing to the emergence of resistant strains. The type of immune reaction is also to be considered in patients infected with Leishmania donovani (L. donovani). For complete eradication of this disease, a high level modern research is currently being applied both at the molecular level as well as at the field level. The computational approaches like remote sensing, geographical information system (GIS) and bioinformatics are the key resources for the detection and distribution of vectors, patterns, ecological and environmental factors and genomic and proteomic analysis. Novel approaches like GIS and bioinformatics have been more appropriately utilized in determining the cause of visearal leishmaniasis and in designing strategies for preventing the disease from spreading from one region to another. PMID:23554714

  1. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  2. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  3. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

    PubMed

    Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

    2013-07-01

    Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  4. Novel approaches to effects-based monitoring: 21st century tools for bio-effects prediction and surveillance

    EPA Science Inventory

    Effects-based monitoring (EBM) has been employed as a complement to chemical monitoring to help address knowledge gaps between chemical occurrence and biological effects. We have piloted several pathway-based approaches to EBM, that utilize modern bioinformatic and high throughpu...

  5. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  6. A Portable Bioinformatics Course for Upper-Division Undergraduate Curriculum in Sciences

    ERIC Educational Resources Information Center

    Floraino, Wely B.

    2008-01-01

    This article discusses the challenges that bioinformatics education is facing and describes a bioinformatics course that is successfully taught at the California State Polytechnic University, Pomona, to the fourth year undergraduate students in biological sciences, chemistry, and computer science. Information on lecture and computer practice…

  7. Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…

  8. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  9. Green Fluorescent Protein-Focused Bioinformatics Laboratory Experiment Suitable for Undergraduates in Biochemistry Courses

    ERIC Educational Resources Information Center

    Rowe, Laura

    2017-01-01

    An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…

  10. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  11. Teaching Bioinformatics and Neuroinformatics by Using Free Web-Based Tools

    ERIC Educational Resources Information Center

    Grisham, William; Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with…

  12. When cloud computing meets bioinformatics: a review.

    PubMed

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  13. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  14. 5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK--Setting the analysis frame.

    PubMed

    Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf

    2005-09-01

    The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.

  15. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  16. BIAS: Bioinformatics Integrated Application Software.

    PubMed

    Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z

    2005-04-15

    We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.

  17. Analysis of galactosemia-linked mutations of GALT enzyme using a computational biology approach.

    PubMed

    Facchiano, A; Marabotti, A

    2010-02-01

    We describe the prediction of the structural and functional effects of mutations on the enzyme galactose-1-phosphate uridyltransferase related to the genetic disease galactosemia, using a fully computational approach. One hundred and seven single-point mutants were simulated starting from the structural model of the enzyme obtained by homology modeling methods. Several bioinformatics programs were then applied to each resulting mutant protein to analyze the effect of the mutations. The mutations have a direct effect on the active site, or on the dimer assembly and stability, or on the monomer stability. We describe how mutations may exert their effect at a molecular level by altering H-bonds, salt bridges, secondary structure or surface features. The alteration of protein stability, at level of monomer and/or dimer, is the main effect observed. We found an agreement between our results and the functional experimental data available in literature for some mutants. The data and analyses for all the mutants are fully available in the web-accessible database hosted at http://bioinformatica.isa.cnr.it/GALT.

  18. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

    PubMed

    Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-03-01

    Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Identification of novel candidate maternal serum protein markers for Down syndrome by integrated proteomic and bioinformatic analysis.

    PubMed

    Kang, Yuan; Dong, Xinran; Zhou, Qiongjie; Zhang, Ying; Cheng, Yan; Hu, Rong; Su, Cuihong; Jin, Hong; Liu, Xiaohui; Ma, Duan; Tian, Weidong; Li, Xiaotian

    2012-03-01

    This study aimed to identify candidate protein biomarkers from maternal serum for Down syndrome (DS) by integrated proteomic and bioinformatics analysis. A pregnancy DS group of 18 women and a control group with the same number were prepared, and the maternal serum proteins were analyzed by isobaric tags for relative and absolute quantitation and mass spectrometry, to identify DS differentially expressed maternal serum proteins (DS-DEMSPs). Comprehensive bioinformatics analysis was then employed to analyze DS-DEMSPs both in this paper and seven related publications. Down syndrome differentially expressed maternal serum proteins from different studies are significantly enriched with common Gene Ontology functions, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, transcription factor binding sites, and Pfam protein domains, However, the DS-DEMSPs are less functionally related to known DS-related genes. These evidences suggest that common molecular mechanisms induced by secondary effects may be present upon DS carrying. A simple scoring scheme revealed Alpha-2-macroglobulin, Apolipoprotein A1, Apolipoprotein E, Complement C1s subcomponent, Complement component 5, Complement component 8, alpha polypeptide, Complement component 8, beta polypeptide and Fibronectin as potential DS biomarkers. The integration of proteomics and bioinformatics studies provides a novel approach to develop new prenatal screening methods for noninvasive yet accurate diagnosis of DS. Copyright © 2012 John Wiley & Sons, Ltd.

  20. Food Safety in the Age of Next Generation Sequencing, Bioinformatics, and Open Data Access.

    PubMed

    Taboada, Eduardo N; Graham, Morag R; Carriço, João A; Van Domselaar, Gary

    2017-01-01

    Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS) as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.

  1. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    ERIC Educational Resources Information Center

    Wefer, Stephen H.; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics…

  2. Exploring Cystic Fibrosis Using Bioinformatics Tools: A Module Designed for the Freshman Biology Course

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2011-01-01

    We incorporated a bioinformatics component into the freshman biology course that allows students to explore cystic fibrosis (CF), a common genetic disorder, using bioinformatics tools and skills. Students learn about CF through searching genetic databases, analyzing genetic sequences, and observing the three-dimensional structures of proteins…

  3. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  4. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  5. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

    PubMed

    Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy

    2007-01-01

    To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).

  6. BioShaDock: a community driven bioinformatics shared Docker-based tools registry

    PubMed Central

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191

  7. BioShaDock: a community driven bioinformatics shared Docker-based tools registry.

    PubMed

    Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier

    2015-01-01

    Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.

  8. Toward Personalized Pressure Ulcer Care Planning: Development of a Bioinformatics System for Individualized Prioritization of Clinical Pratice Guideline

    DTIC Science & Technology

    2016-10-01

    and text data mining . A Spinal Cord Injury Pressure Ulcer and Deep tissue injury ontology, SCIPUDO, will be developed to ensure robust and extensive...on natural language programming and the need to convert text in to data for analysis. In progress c) Define Physio-MIMI based SCIPUD+ Resource...information extraction from the free text clinical note. 3) Significant Results Nothing to report 4) Other Achievements Nothing to report

  9. Bioinformatics workflows and web services in systems biology made easy for experimentalists.

    PubMed

    Jimenez, Rafael C; Corpas, Manuel

    2013-01-01

    Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.

  10. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

    PubMed

    Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

    2016-02-04

    Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

  11. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

    PubMed Central

    Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

    2016-01-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418

  12. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

    PubMed

    Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

    2016-09-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

  13. Bioinformatics in the orphan crops.

    PubMed

    Armstead, Ian; Huang, Lin; Ravagnani, Adriana; Robson, Paul; Ougham, Helen

    2009-11-01

    Orphan crops are those which are grown as food, animal feed or other crops of some importance in agriculture, but which have not yet received the investment of research effort or funding required to develop significant public bioinformatics resources. Where an orphan crop is related to a well-characterised model plant species, comparative genomics and bioinformatics can often, though not always, be exploited to assist research and crop improvement. This review addresses some challenges and opportunities presented by bioinformatics in the orphan crops, using three examples: forage grasses from the genera Lolium and Festuca, forage legumes and the second generation energy crop Miscanthus.

  14. An ontology-based framework for bioinformatics workflows.

    PubMed

    Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

    2007-01-01

    The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.

  15. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    PubMed

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  16. Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

    PubMed

    Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

    2015-05-01

    Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  17. SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.

    PubMed

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-03-08

    The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.

  18. SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

    PubMed Central

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-01-01

    Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562

  19. The implementation of e-learning tools to enhance undergraduate bioinformatics teaching and learning: a case study in the National University of Singapore

    PubMed Central

    2009-01-01

    Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511

  20. Ten quick tips for machine learning in computational biology.

    PubMed

    Chicco, Davide

    2017-01-01

    Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.

  1. A Novel Partial Sequence Alignment Tool for Finding Large Deletions

    PubMed Central

    Aruk, Taner; Ustek, Duran; Kursun, Olcay

    2012-01-01

    Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777

  2. Integration of Bioinformatics into an Undergraduate Biology Curriculum and the Impact on Development of Mathematical Skills

    ERIC Educational Resources Information Center

    Wightman, Bruce; Hark, Amy T.

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this…

  3. Making Bioinformatics Projects a Meaningful Experience in an Undergraduate Biotechnology or Biomedical Science Programme

    ERIC Educational Resources Information Center

    Sutcliffe, Iain C.; Cummings, Stephen P.

    2007-01-01

    Bioinformatics has emerged as an important discipline within the biological sciences that allows scientists to decipher and manage the vast quantities of data (such as genome sequences) that are now available. Consequently, there is an obvious need to provide graduates in biosciences with generic, transferable skills in bioinformatics. We present…

  4. The S-Star Trial Bioinformatics Course: An On-line Learning Success

    ERIC Educational Resources Information Center

    Lim, Yun Ping; Hoog, Jan-Olov; Gardner, Phyllis; Ranganathan, Shoba; Andersson, Siv; Subbiah, Subramanian; Tan, Tin Wee; Hide, Winston; Weiss, Anthony S.

    2003-01-01

    The S-Star Trial Bioinformatics on-line course (www.s-star.org) is a global experiment in bioinformatics distance education. Six universities from five continents have participated in this project. One hundred and fifty students participated in the first trial course of which 96 followed through the entire course and 70 fulfilled the overall…

  5. Reactome Pengine: A web-logic API to the homo sapiens reactome.

    PubMed

    Neaves, Samuel R; Tsoka, Sophia; Millard, Louise A C

    2018-03-30

    Existing ways of accessing data from the Reactome database are limited. Either a researcher is restricted to particular queries defined by a web application programming interface (API), or they have to download the whole database. Reactome Pengine is a web service providing a logic programming based API to the human reactome. This gives researchers greater flexibility in data access than existing APIs, as users can send their own small programs (alongside queries) to Reactome Pengine. The server and an example notebook can be found at https://apps.nms.kcl.ac.uk/reactome-pengine. Source code is available at https://github.com/samwalrus/reactome-pengine and a Docker image is available at https://hub.docker.com/r/samneaves/rp4/ . samuel.neaves@kcl.ac.uk. Supplementary data are available at Bioinformatics online.

  6. 78 FR 35936 - Statement of Organization, Functions, and Delegations of Authority

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-06-14

    ... to, laboratory information systems, quality management systems and bioinformatics; (3) ensures a safe working environment in NCIRD laboratories; and (4) collaborates effectively with other centers and offices...

  7. The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

    PubMed Central

    2011-01-01

    Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework. PMID:21806842

  8. The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.

    PubMed

    Katayama, Toshiaki; Wilkinson, Mark D; Vos, Rutger; Kawashima, Takeshi; Kawashima, Shuichi; Nakao, Mitsuteru; Yamamoto, Yasunori; Chun, Hong-Woo; Yamaguchi, Atsuko; Kawano, Shin; Aerts, Jan; Aoki-Kinoshita, Kiyoko F; Arakawa, Kazuharu; Aranda, Bruno; Bonnal, Raoul Jp; Fernández, José M; Fujisawa, Takatomo; Gordon, Paul Mk; Goto, Naohisa; Haider, Syed; Harris, Todd; Hatakeyama, Takashi; Ho, Isaac; Itoh, Masumi; Kasprzyk, Arek; Kido, Nobuhiro; Kim, Young-Joo; Kinjo, Akira R; Konishi, Fumikazu; Kovarskaya, Yulia; von Kuster, Greg; Labarga, Alberto; Limviphuvadh, Vachiranee; McCarthy, Luke; Nakamura, Yasukazu; Nam, Yunsun; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Oinn, Tom; Okamoto, Shinobu; Okuda, Shujiro; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Putnam, Nicholas; Senger, Martin; Severin, Jessica; Shigemoto, Yasumasa; Sugawara, Hideaki; Taylor, James; Trelles, Oswaldo; Yamasaki, Chisato; Yamashita, Riu; Satoh, Noriyuki; Takagi, Toshihisa

    2011-08-02

    The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.

  9. Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

    PubMed Central

    Diaz Acosta, B.

    2011-01-01

    The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.

  10. Bioinformatics in high school biology curricula: a study of state science standards.

    PubMed

    Wefer, Stephen H; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.

  11. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    PubMed

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  12. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    PubMed Central

    Cheng, Gong; Zhang, Guocai; Xu, Liang

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

  13. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  14. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    PubMed Central

    Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students. PMID:18316818

  15. MACBenAbim: A Multi-platform Mobile Application for searching keyterms in Computational Biology and Bioinformatics.

    PubMed

    Oluwagbemi, Olugbenga O; Adewumi, Adewole; Esuruoso, Abimbola

    2012-01-01

    Computational biology and bioinformatics are gradually gaining grounds in Africa and other developing nations of the world. However, in these countries, some of the challenges of computational biology and bioinformatics education are inadequate infrastructures, and lack of readily-available complementary and motivational tools to support learning as well as research. This has lowered the morale of many promising undergraduates, postgraduates and researchers from aspiring to undertake future study in these fields. In this paper, we developed and described MACBenAbim (Multi-platform Mobile Application for Computational Biology and Bioinformatics), a flexible user-friendly tool to search for, define and describe the meanings of keyterms in computational biology and bioinformatics, thus expanding the frontiers of knowledge of the users. This tool also has the capability of achieving visualization of results on a mobile multi-platform context. MACBenAbim is available from the authors for non-commercial purposes.

  16. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  17. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  18. University-Level Practical Activities in Bioinformatics Benefit Voluntary Groups of Pupils in the Last 2 Years of School

    ERIC Educational Resources Information Center

    Barker, Daniel; Alderson, Rosanna G.; McDonagh, James L.; Plaisier, Heleen; Comrie, Muriel M.; Duncan, Leigh; Muirhead, Gavin T. P.; Sweeney, Stuart D.

    2015-01-01

    Background: Bioinformatics--the use of computers in biology--is of major and increasing importance to biological sciences and medicine. We conducted a preliminary investigation of the value of bringing practical, university-level bioinformatics education to the school level. We conducted voluntary activities for pupils at two schools in Scotland…

  19. The Air Force In Silico -- Computational Biology in 2025

    DTIC Science & Technology

    2007-11-01

    and chromosome) these new fields are commonly referred to as “~omics.” Proteomics , transcriptomics, metabolomics , epigenomics, physiomics... Bioinformatics , 2006, http://journal.imbio.de/ http://www-bm.ipk-gatersleben.de/stable/php/ journal /articles/pdf/jib-22.pdf (accessed 30 September...Chirino, G. Tansley and I. Dryden, “The implications for Bioinformatics of integration across physical scales,” Journal of Integrative Bioinformatics

  20. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  1. Biotool2Web: creating simple Web interfaces for bioinformatics applications.

    PubMed

    Shahid, Mohammad; Alam, Intikhab; Fuellen, Georg

    2006-01-01

    Currently there are many bioinformatics applications being developed, but there is no easy way to publish them on the World Wide Web. We have developed a Perl script, called Biotool2Web, which makes the task of creating web interfaces for simple ('home-made') bioinformatics applications quick and easy. Biotool2Web uses an XML document containing the parameters to run the tool on the Web, and generates the corresponding HTML and common gateway interface (CGI) files ready to be published on a web server. This tool is available for download at URL http://www.uni-muenster.de/Bioinformatics/services/biotool2web/ Georg Fuellen (fuellen@alum.mit.edu).

  2. India's Computational Biology Growth and Challenges.

    PubMed

    Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy

    2016-09-01

    India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.

  3. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    PubMed Central

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of Gagne's Conditions of Learning instructional design theory. This theory, although first published in the early 1970s, is still fundamental in instructional design and instructional technology. First, top-level as well as prerequisite learning objectives for a microarray analysis workshop and a primer design workshop were defined. Then a hierarchy of objectives for each workshop was created. Hands-on tutorials were designed to meet these objectives. Finally, events of learning proposed by Gagne's theory were incorporated into the hands-on tutorials. The resultant manuals were tested on a small number of trainees, revised, and applied in 1-day bioinformatics workshops. Based on this experience and on observations made during the workshops, we conclude that Gagne's Conditions of Learning instructional design theory provides a useful framework for developing bioinformatics training, but may not be optimal as a method for teaching it. PMID:16220141

  4. Incorporating a collaborative web-based virtual laboratory in an undergraduate bioinformatics course.

    PubMed

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a face-to-face lecture course with a web-based virtual laboratory presents new opportunities for collaborative learning of the conceptual material, and for fostering peer support of technical bioinformatics questions. To explore this combination, an in-person lecture-only undergraduate bioinformatics course was augmented with a remote web-based laboratory, and tested with a large class. This study hypothesized that the collaborative virtual lab would foster active learning and peer support, and tested this hypothesis by conducting a student survey near the end of the semester. Respondents broadly reported strong benefits from the online laboratory, and strong benefits from peer-provided technical support. In comparison with traditional in-person teaching labs, students preferred the virtual lab by a factor of two. Key aspects of the course architecture and design are described to encourage further experimentation in teaching collaborative online bioinformatics laboratories. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.

  5. Taking Bioinformatics to Systems Medicine.

    PubMed

    van Kampen, Antoine H C; Moerland, Perry D

    2016-01-01

    Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.

  6. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  7. A new paradigm for transcription factor TFIIB functionality

    PubMed Central

    Gelev, Vladimir; Zabolotny, Janice M.; Lange, Martin; Hiromura, Makoto; Yoo, Sang Wook; Orlando, Joseph S.; Kushnir, Anna; Horikoshi, Nobuo; Paquet, Eric; Bachvarov, Dimcho; Schaffer, Priscilla A.; Usheva, Anny

    2014-01-01

    Experimental and bioinformatic studies of transcription initiation by RNA polymerase II (RNAP2) have revealed a mechanism of RNAP2 transcription initiation less uniform across gene promoters than initially thought. However, the general transcription factor TFIIB is presumed to be universally required for RNAP2 transcription initiation. Based on bioinformatic analysis of data and effects of TFIIB knockdown in primary and transformed cell lines on cellular functionality and global gene expression, we report that TFIIB is dispensable for transcription of many human promoters, but is essential for herpes simplex virus-1 (HSV-1) gene transcription and replication. We report a novel cell cycle TFIIB regulation and localization of the acetylated TFIIB variant on the transcriptionally silent mitotic chromatids. Taken together, these results establish a new paradigm for TFIIB functionality in human gene expression, which when downregulated has potent anti-viral effects. PMID:24441171

  8. The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.

    PubMed

    Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin; Choi, In Young

    2016-03-01

    Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA.

  9. Integration of data systems and technology improves research and collaboration for a superfund research center.

    PubMed

    Hobbie, Kevin A; Peterson, Elena S; Barton, Michael L; Waters, Katrina M; Anderson, Kim A

    2012-08-01

    Large collaborative centers are a common model for accomplishing integrated environmental health research. These centers often include various types of scientific domains (e.g., chemistry, biology, bioinformatics) that are integrated to solve some of the nation's key economic or public health concerns. The Superfund Research Center (SRP) at Oregon State University (OSU) is one such center established in 2008 to study the emerging health risks of polycyclic aromatic hydrocarbons while using new technologies both in the field and laboratory. With outside collaboration at remote institutions, success for the center as a whole depends on the ability to effectively integrate data across all research projects and support cores. Therefore, the OSU SRP center developed a system that integrates environmental monitoring data with analytical chemistry data and downstream bioinformatics and statistics to enable complete "source-to-outcome" data modeling and information management. This article describes the development of this integrated information management system that includes commercial software for operational laboratory management and sample management in addition to open-source custom-built software for bioinformatics and experimental data management.

  10. Integration of Data Systems and Technology Improves Research and Collaboration for a Superfund Research Center

    PubMed Central

    Hobbie, Kevin A.; Peterson, Elena S.; Barton, Michael L.; Waters, Katrina M.; Anderson, Kim A.

    2012-01-01

    Large collaborative centers are a common model for accomplishing integrated environmental health research. These centers often include various types of scientific domains (e.g. chemistry, biology, bioinformatics) that are integrated to solve some of the nation’s key economic or public health concerns. The Superfund Research Center (SRP) at Oregon State University (OSU) is one such center established in 2008 to study the emerging health risks of polycyclic aromatic hydrocarbons while utilizing new technologies both in the field and laboratory. With outside collaboration at remote institutions, success for the center as a whole depends on the ability to effectively integrate data across all research projects and support cores. Therefore, the OSU SRP center developed a system that integrates environmental monitoring data with analytical chemistry data and downstream bioinformatics and statistics to enable complete ‘source to outcome’ data modeling and information management. This article describes the development of this integrated information management system that includes commercial software for operational laboratory management and sample management in addition to open source custom built software for bioinformatics and experimental data management. PMID:22651935

  11. NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results.

    PubMed

    Lagnel, Jacques; Tsigenopoulos, Costas S; Iliopoulos, Ioannis

    2009-03-15

    NOBLAST (New Options for BLAST) is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database. JAMBLAST using the NOBLAST output allows the user to manage, view and filter the BLAST hits using a number of selection criteria. A distribution package of NOBLAST and JAMBLAST including detailed installation procedure is freely available from http://sourceforge.net/projects/JAMBLAST/ and http://sourceforge.net/projects/NOBLAST. Supplementary data are available at Bioinformatics online.

  12. A computational approach to extinction events in chemical reaction networks with discrete state spaces.

    PubMed

    Johnston, Matthew D

    2017-12-01

    Recent work of Johnston et al. has produced sufficient conditions on the structure of a chemical reaction network which guarantee that the corresponding discrete state space system exhibits an extinction event. The conditions consist of a series of systems of equalities and inequalities on the edges of a modified reaction network called a domination-expanded reaction network. In this paper, we present a computational implementation of these conditions written in Python and apply the program on examples drawn from the biochemical literature. We also run the program on 458 models from the European Bioinformatics Institute's BioModels Database and report our results. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. REDItools: high-throughput RNA editing detection made easy.

    PubMed

    Picardi, Ernesto; Pesole, Graziano

    2013-07-15

    The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.

  14. Distant plant homologues: don't throw out the baby.

    PubMed

    Gardiner, John; Overall, Robyn; Marc, Jan

    2012-03-01

    Plants and metazoans share many similarities in terms of conserved proteins. Antibodies have been used extensively to detect remote homologues, many of which are yet to be identified conclusively. Genome sequencing and the creation of novel sequence or structure comparison programs have assisted greatly in the identification of distant protein homologues. The continuing development of new software algorithms and the combining of bioinformatics with proteomics offer hope that remaining homologues will be soon identified. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. An open-source java platform for automated reaction mapping.

    PubMed

    Crabtree, John D; Mehta, Dinesh P; Kouri, Tina M

    2010-09-27

    This article presents software applications that have been built upon a modular, open-source, reaction mapping library that can be used in both cheminformatics and bioinformatics research. We first describe the theoretical underpinnings and modular architecture of the core software library. We then describe two applications that have been built upon that core. The first is a generic reaction viewer and mapper, and the second classifies reactions according to rules that can be modified by end users with little or no programming skills.

  16. Scientific Workflow Management in Proteomics

    PubMed Central

    de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

    2012-01-01

    Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703

  17. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  18. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    PubMed

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  19. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    PubMed Central

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

  20. Missing "Links" in Bioinformatics Education: Expanding Students' Conceptions of Bioinformatics Using a Biodiversity Database of Living and Fossil Reef Corals

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Budd, Ann F.

    2006-01-01

    NMITA is a reef coral biodiversity database that we use to introduce students to the expansive realm of bioinformatics beyond genetics. We introduce a series of lessons that have students use this database, thereby accessing real data that can be used to test hypotheses about biodiversity and evolution while targeting the "National Science …

  1. A high level interface to SCOP and ASTRAL implemented in python.

    PubMed

    Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S

    2006-01-10

    Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.

  2. Parallel evolutionary computation in bioinformatics applications.

    PubMed

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  3. Informatic analysis reveals Legionella as a source of novel natural products.

    PubMed

    Johnston, Chad W; Plumb, Jonathan; Li, Xiang; Grinstein, Sergio; Magarvey, Nathan A

    2016-06-01

    Microbial natural products are a crucial source of bioactive molecules and unique chemical scaffolds. Despite their importance, rediscovery of known natural products from established productive microbes has led to declining interest, even while emergent genomic data suggest that the majority of microbial natural products remain to be discovered. Now, new sources of microbial natural products must be defined in order to provide chemical scaffolds for the next generation of small molecules for therapeutic, agricultural, and industrial purposes. In this work, we use specialized bioinformatic programs, genetic knockouts, and comparative metabolomics to define the genus Legionella as a new source of novel natural products. We show that Legionella spp. hold a diverse collection of biosynthetic gene clusters for the production of polyketide and nonribosomal peptide natural products. To confirm this bioinformatic survey, we create targeted mutants of L. pneumophila and use comparative metabolomics to identify a novel polyketide surfactant. Using spectroscopic techniques, we show that this polyketide possesses a new chemical scaffold, and firmly demonstrate that this unexplored genus is a source for novel natural products.

  4. Identification of rare paired box 3 variant in strabismus by whole exome sequencing

    PubMed Central

    Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

    2017-01-01

    AIM To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. METHODS A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. RESULTS Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. CONCLUSION Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder. PMID:28861346

  5. StrBioLib: a Java library for development of custom computational structural biology applications.

    PubMed

    Chandonia, John-Marc

    2007-08-01

    StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/

  6. ProQ3D: improved model quality assessments using deep learning.

    PubMed

    Uziela, Karolis; Menéndez Hurtado, David; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne

    2017-05-15

    Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features). ProQ3D is freely available both as a webserver and a stand-alone program at http://proq3.bioinfo.se/. arne@bioinfo.se. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  8. Identification of rare paired box 3 variant in strabismus by whole exome sequencing.

    PubMed

    Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

    2017-01-01

    To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder.

  9. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data.

    PubMed

    de Andrade, Roberto R S; Vaslin, Maite F S

    2014-03-07

    Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.

  10. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data

    PubMed Central

    2014-01-01

    Background Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. Methods In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. Results The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. Conclusions SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. Availability and implementation SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/. PMID:24607237

  11. Bioinformatics/biostatistics: microarray analysis.

    PubMed

    Eichler, Gabriel S

    2012-01-01

    The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).

  12. The Interactions Between Clinical Informatics and Bioinformatics

    PubMed Central

    Altman, Russ B.

    2000-01-01

    For the past decade, Stanford Medical Informatics has combined clinical informatics and bioinformatics research and training in an explicit way. The interest in applying informatics techniques to both clinical problems and problems in basic science can be traced to the Dendral project in the 1960s. Having bioinformatics and clinical informatics in the same academic unit is still somewhat unusual and can lead to clashes of clinical and basic science cultures. Nevertheless, the benefits of this organization have recently become clear, as the landscape of academic medicine in the next decades has begun to emerge. The author provides examples of technology transfer between clinical informatics and bioinformatics that illustrate how they complement each other. PMID:10984462

  13. Towards an open, collaborative, reusable framework for sharing hands-on bioinformatics training workshops

    PubMed Central

    Revote, Jerico; Suchecki, Radosław; Tyagi, Sonika; Corley, Susan M.; Shang, Catherine A.; McGrath, Annette

    2017-01-01

    Abstract There is a clear demand for hands-on bioinformatics training. The development of bioinformatics workshop content is both time-consuming and expensive. Therefore, enabling trainers to develop bioinformatics workshops in a way that facilitates reuse is becoming increasingly important. The most widespread practice for sharing workshop content is through making PDF, PowerPoint and Word documents available online. While this effort is to be commended, such content is usually not so easy to reuse or repurpose and does not capture all the information required for a third party to rerun a workshop. We present an open, collaborative framework for developing and maintaining, reusable and shareable hands-on training workshop content. PMID:26984618

  14. Biomedical informatics training at the University of Wisconsin-Madison.

    PubMed

    Severtson, D J; Pape, L; Page, C D; Shavlik, J W; Phillips, G N; Flatley Brennan, P

    2007-01-01

    The purpose of this paper is to describe biomedical informatics training at the University of Wisconsin-Madison (UW-Madison). We reviewed biomedical informatics training, research, and faculty/trainee participation at UW-Madison. There are three primary approaches to training 1) The Computation & Informatics in Biology & Medicine Training Program, 2) formal biomedical informatics offered by various campus departments, and 3) individualized programs. Training at UW-Madison embodies the features of effective biomedical informatics training recommended by the American College of Medical Informatics that were delineated as: 1) curricula that integrate experiences among computational sciences and application domains, 2) individualized and interdisciplinary cross-training among a diverse cadre of trainees to develop key competencies that he or she does not initially possess, 3) participation in research and development activities, and 4) exposure to a range of basic informational and computational sciences. The three biomedical informatics training approaches immerse students in multidisciplinary training and education that is supported by faculty trainers who participate in collaborative research across departments. Training is provided across a range of disciplines and available at different training stages. Biomedical informatics training at UW-Madison illustrates how a large research University, with multiple departments across biological, computational and health fields, can provide effective and productive biomedical informatics training via multiple bioinformatics training approaches.

  15. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  16. An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

    PubMed

    van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

    2009-06-01

    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.

  17. The plant leaf movement analyzer (PALMA): a simple tool for the analysis of periodic cotyledon and leaf movement in Arabidopsis thaliana.

    PubMed

    Wagner, Lucas; Schmal, Christoph; Staiger, Dorothee; Danisman, Selahattin

    2017-01-01

    The analysis of circadian leaf movement rhythms is a simple yet effective method to study effects of treatments or gene mutations on the circadian clock of plants. Currently, leaf movements are analysed using time lapse photography and subsequent bioinformatics analyses of leaf movements. Programs that are used for this purpose either are able to perform one function (i.e. leaf tip detection or rhythm analysis) or their function is limited to specific computational environments. We developed a leaf movement analysis tool-PALMA-that works in command line and combines image extraction with rhythm analysis using Fast Fourier transformation and non-linear least squares fitting. We validated PALMA in both simulated time series and in experiments using the known short period mutant sensitivity to red light reduced 1 ( srr1 - 1 ). We compared PALMA with two established leaf movement analysis tools and found it to perform equally well. Finally, we tested the effect of reduced iron conditions on the leaf movement rhythms of wild type plants. Here, we found that PALMA successfully detected period lengthening under reduced iron conditions. PALMA correctly estimated the period of both simulated and real-life leaf movement experiments. As a platform-independent console-program that unites both functions needed for the analysis of circadian leaf movements it is a valid alternative to existing leaf movement analysis tools.

  18. Extracting patterns of database and software usage from the bioinformatics literature

    PubMed Central

    Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert

    2014-01-01

    Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253

  19. 'Students-as-partners' scheme enhances postgraduate students' employability skills while addressing gaps in bioinformatics education.

    PubMed

    Mello, Luciane V; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan

    2017-01-01

    Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student-staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators' teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability.

  20. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  1. Agents in bioinformatics, computational and systems biology.

    PubMed

    Merelli, Emanuela; Armano, Giuliano; Cannata, Nicola; Corradini, Flavio; d'Inverno, Mark; Doms, Andreas; Lord, Phillip; Martin, Andrew; Milanesi, Luciano; Möller, Steffen; Schroeder, Michael; Luck, Michael

    2007-01-01

    The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.

  2. ‘Students-as-partners’ scheme enhances postgraduate students’ employability skills while addressing gaps in bioinformatics education

    PubMed Central

    Mello, Luciane V.; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan

    2017-01-01

    Abstract Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student–staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators’ teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability. PMID:29098185

  3. E-Learning as a new tool in bioinformatics teaching

    PubMed Central

    Saravanan, Vijayakumar; Shanmughavel, Piramanayagam

    2007-01-01

    In recent years, virtual learning is growing rapidly. Universities, colleges, and secondary schools are now delivering training and education over the internet. Beside this, resources available over the WWW are huge and understanding the various techniques employed in the field of Bioinformatics is increasingly complex for students during implementation. Here, we discuss its importance in developing and delivering an educational system in Bioinformatics based on e-learning environment. PMID:18292800

  4. Teaching bioinformatics and neuroinformatics by using free web-based tools.

    PubMed

    Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.

  5. Analysis of requirements for teaching materials based on the course bioinformatics for plant metabolism

    NASA Astrophysics Data System (ADS)

    Balqis, Widodo, Lukiati, Betty; Amin, Mohamad

    2017-05-01

    A way to improve the quality of learning in the course of Plant Metabolism in the Department of Biology, State University of Malang, is to develop teaching materials. This research evaluates the needs of bioinformatics-based teaching material in the course Plant Metabolism by the Analyze, Design, Develop, Implement, and Evaluate (ADDIE) development model. Data were collected through questionnaires distributed to the students in the Plant Metabolism course of the Department of Biology, University of Malang, and analysis of the plan of lectures semester (RPS). Learning gains of this course show that it is not yet integrated into the field of bioinformatics. All respondents stated that plant metabolism books do not include bioinformatics and fail to explain the metabolism of a chemical compound of a local plant in Indonesia. Respondents thought that bioinformatics can explain examples and metabolism of a secondary metabolite analysis techniques and discuss potential medicinal compounds from local plants. As many as 65% of the respondents said that the existing metabolism book could not be used to understand secondary metabolism in lectures of plant metabolism. Therefore, the development of teaching materials including plant metabolism-based bioinformatics is important to improve the understanding of the lecture material in plant metabolism.

  6. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Planning bioinformatics workflows using an expert system

    PubMed Central

    Chen, Xiaoling; Chang, Jeffrey T.

    2017-01-01

    Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928

  8. Pancreas Cancer Precision Treatment Using Avatar Mice from a Bioinformatics Perspective.

    PubMed

    Perales-Patón, Javier; Piñeiro-Yañez, Elena; Tejero, Héctor; López-Casas, Pedro P; Hidalgo, Manuel; Gómez-López, Gonzalo; Al-Shahrour, Fátima

    2017-01-01

    Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer-related death among solid malignancies. Unfortunately, PDAC lethality has not substantially decreased over the past 20 years. This aggressiveness is related to the genomic complexity and heterogeneity of PDAC, but also to the absence of an effective screening for the detection of early-stage tumors and a lack of efficient therapeutic options. Therefore, there is an urgent need to improve the arsenal of anti-PDAC drugs for an effective treatment of these patients. Patient-derived xenograft (PDX) mouse models represent a promising strategy to personalize PDAC treatment, offering a bench testing of candidate treatments and helping to select empirical treatments in PDAC patients with no therapeutic targets. Moreover, bioinformatics-based approaches have the potential to offer systematic insights into PDAC etiology predicting putatively actionable tumor-specific genomic alterations, identifying novel biomarkers and generating disease-associated gene expression signatures. This review focuses on recent efforts to individualize PDAC treatments using PDX models. Additionally, we discuss the current understanding of the PDAC genomic landscape and the putative druggable targets derived from mutational studies. PDAC molecular subclassifications and gene expression profiling studies are reviewed as well. Finally, latest bioinformatics methodologies based on somatic variant detection and prioritization, in silico drug response prediction, and drug repositioning to improve the treatment of advanced PDAC tumors are also covered. © 2017 S. Karger AG, Basel.

  9. HHsvm: fast and accurate classification of profile–profile matches identified by HHsearch

    PubMed Central

    Dlakić, Mensur

    2009-01-01

    Motivation: Recently developed profile–profile methods rival structural comparisons in their ability to detect homology between distantly related proteins. Despite this tremendous progress, many genuine relationships between protein families cannot be recognized as comparisons of their profiles result in scores that are statistically insignificant. Results: Using known evolutionary relationships among protein superfamilies in SCOP database, support vector machines were trained on four sets of discriminatory features derived from the output of HHsearch. Upon validation, it was shown that the automatic classification of all profile–profile matches was superior to fixed threshold-based annotation in terms of sensitivity and specificity. The effectiveness of this approach was demonstrated by annotating several domains of unknown function from the Pfam database. Availability: Programs and scripts implementing the methods described in this manuscript are freely available from http://hhsvm.dlakiclab.org/. Contact: mdlakic@montana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19773335

  10. 3D-SURFER: software for high-throughput protein surface comparison and analysis

    PubMed Central

    La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke

    2009-01-01

    Summary: We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. Availability: 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19759195

  11. 3D-SURFER: software for high-throughput protein surface comparison and analysis.

    PubMed

    La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke

    2009-11-01

    We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer dkihara@purdue.edu Supplementary data are available at Bioinformatics online.

  12. Long non-coding RNA HOTTIP promotes prostate cancer cells proliferation and migration by sponging miR-216a-5p.

    PubMed

    Yang, Bin; Gao, Ge; Wang, Zhixin; Sun, Daju; Wei, Xin; Ma, Yanan; Ding, Youpeng

    2018-06-08

    Long non-coding RNAs (lncRNAs) are a class of ncRNAs with > 200 nucleotides in length that regulate gene expression. The HOXA transcript at the distal tip (HOTTIP) lncRNA plays an important role in carcinogenesis, however, the underlying role of HOTTIP in prostate cancer (PCa) remain unknown. The aim of the present study was to evaluate the expression and function of HOTTIP in PCa. In the present study, we analyzed HOTTIP expression levels of 86 PCa patients in tumor and adjacent normal tissue by real-time quantitative PCR. Knockdown or overexpression of HOTTIP was performed to explore its roles in cell proliferation, migration, invasion, and cell cycle. Furthermore, bioinformatics online programs predicted and luciferase reporter assay were used to validate the association of HOTTIP and miR-216a-5p in PCa cells. Our results found that HOTTIP was up-regulated in human primary PCa tissues with lymph node metastasis. Knockdown of HOTTIP inhibited PCa cell proliferation, migration and invasion. Overexpression of HOTTIP promoted cell proliferation, migration and invasion of PCa cells. Bioinformatics online programs predicted that HOTTIP sponge miR-216a-5p at 3'-UTR with complementary binding sites, which was validated using luciferase reporter assay. HOTTIP could negatively regulate the expression of miR-216a-5p in PCa cells. Above all, knockdown of HOTTIP could represent a rational therapeutic strategy for PCa. ©2018 The Author(s).

  13. GeoSymbio: a hybrid, cloud-based web application of global geospatial bioinformatics and ecoinformatics for Symbiodinium-host symbioses.

    PubMed

    Franklin, Erik C; Stat, Michael; Pochon, Xavier; Putnam, Hollie M; Gates, Ruth D

    2012-03-01

    The genus Symbiodinium encompasses a group of unicellular, photosynthetic dinoflagellates that are found free living or in hospite with a wide range of marine invertebrate hosts including scleractinian corals. We present GeoSymbio, a hybrid web application that provides an online, easy to use and freely accessible interface for users to discover, explore and utilize global geospatial bioinformatic and ecoinformatic data on Symbiodinium-host symbioses. The novelty of this application lies in the combination of a variety of query and visualization tools, including dynamic searchable maps, data tables with filter and grouping functions, and interactive charts that summarize the data. Importantly, this application is hosted remotely or 'in the cloud' using Google Apps, and therefore does not require any specialty GIS, web programming or data programming expertise from the user. The current version of the application utilizes Symbiodinium data based on the ITS2 genetic marker from PCR-based techniques, including denaturing gradient gel electrophoresis, sequencing and cloning of specimens collected during 1982-2010. All data elements of the application are also downloadable as spatial files, tables and nucleic acid sequence files in common formats for desktop analysis. The application provides a unique tool set to facilitate research on the basic biology of Symbiodinium and expedite new insights into their ecology, biogeography and evolution in the face of a changing global climate. GeoSymbio can be accessed at https://sites.google.com/site/geosymbio/. © 2011 Blackwell Publishing Ltd.

  14. JABAWS 2.2 distributed web services for Bioinformatics: protein disorder, conservation and RNA secondary structure.

    PubMed

    Troshin, Peter V; Procter, James B; Sherstnev, Alexander; Barton, Daniel L; Madeira, Fábio; Barton, Geoffrey J

    2018-06-01

    JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.0, JABAWS 2.2 includes three additional MSA programs (Clustal Omega, MSAprobs, GLprobs), four protein disorder prediction methods (DisEMBL, IUPred, Ronn, GlobPlot), 18 measures of protein conservation as implemented in AACon, and RNA secondary structure prediction by the RNAalifold program. JABAWS 2.2 can be deployed on a variety of in-house or hosted systems. JABAWS 2.2 web services may be accessed from the Jalview multiple sequence analysis workbench (Version 2.8 and later), as well as directly via the JABAWS command line interface (CLI) client. JABAWS 2.2 can be deployed on a local virtual server as a Virtual Appliance (VA) or simply as a Web Application Archive (WAR) for private use. Improvements in JABAWS 2.2 also include simplified installation and a range of utility tools for usage statistics collection, and web services querying and monitoring. The JABAWS CLI client has been updated to support all the new services and allow integration of JABAWS 2.2 services into conventional scripts. A public JABAWS 2 server has been in production since December 2011 and served over 800 000 analyses for users worldwide. JABAWS 2.2 is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. g.j.barton@dundee.ac.uk.

  15. Decision tree and ensemble learning algorithms with their applications in bioinformatics.

    PubMed

    Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping

    2011-01-01

    Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.

  16. Survey of MapReduce frame operation in bioinformatics.

    PubMed

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. 2016 update on APBioNet's annual international conference on bioinformatics (InCoB).

    PubMed

    Schönbach, Christian; Verma, Chandra; Wee, Lawrence Jin Kiat; Bond, Peter John; Ranganathan, Shoba

    2016-12-22

    InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.

  18. Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

    PubMed

    Newberg, Lee A

    2008-08-15

    A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.

  19. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  20. Skate Genome Project: Cyber-Enabled Bioinformatics Collaboration

    PubMed Central

    Vincent, J.

    2011-01-01

    The Skate Genome Project, a pilot project of the North East Cyber infrastructure Consortium, aims to produce a draft genome sequence of Leucoraja erinacea, the Little Skate. The pilot project was designed to also develop expertise in large scale collaborations across the NECC region. An overview of the bioinformatics and infrastructure challenges faced during the first year of the project will be presented. Results to date and lessons learned from the perspective of a bioinformatics core will be highlighted.

  1. Protein Bioinformatics Databases and Resources

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

    2017-01-01

    Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231

  2. Bioconductor: open software development for computational biology and bioinformatics

    PubMed Central

    Gentleman, Robert C; Carey, Vincent J; Bates, Douglas M; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean YH; Zhang, Jianhua

    2004-01-01

    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples. PMID:15461798

  3. Prediction of Acute Mountain Sickness using a Blood-Based Test

    DTIC Science & Technology

    2016-01-01

    2015): In quarter 17 we focused on two major tasks: getting the RNA purified and ready for chip analysis and working on the bioinformatics ... bioinformatics organization of all the data we will examine for this study. To remind the reviewer, we have a primary dataset of ~120 subjects who were studied...companion study, AltitudeOmics, to the database of gene studies to be analyzed for AMS prediction • expansion of a bioinformatics team to include an

  4. Teaching the bioinformatics of signaling networks: an integrated approach to facilitate multi-disciplinary learning.

    PubMed

    Korcsmaros, Tamas; Dunai, Zsuzsanna A; Vellai, Tibor; Csermely, Peter

    2013-09-01

    The number of bioinformatics tools and resources that support molecular and cell biology approaches is continuously expanding. Moreover, systems and network biology analyses are accompanied more and more by integrated bioinformatics methods. Traditional information-centered university teaching methods often fail, as (1) it is impossible to cover all existing approaches in the frame of a single course, and (2) a large segment of the current bioinformation can become obsolete in a few years. Signaling network offers an excellent example for teaching bioinformatics resources and tools, as it is both focused and complex at the same time. Here, we present an outline of a university bioinformatics course with four sample practices to demonstrate how signaling network studies can integrate biochemistry, genetics, cell biology and network sciences. We show that several bioinformatics resources and tools, as well as important concepts and current trends, can also be integrated to signaling network studies. The research-type hands-on experiences we show enable the students to improve key competences such as teamworking, creative and critical thinking and problem solving. Our classroom course curriculum can be re-formulated as an e-learning material or applied as a part of a specific training course. The multi-disciplinary approach and the mosaic setup of the course have the additional benefit to support the advanced teaching of talented students.

  5. An overview of topic modeling and its current applications in bioinformatics.

    PubMed

    Liu, Lin; Tang, Lin; Dong, Wen; Yao, Shaowen; Zhou, Wei

    2016-01-01

    With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics. This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications. Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.

  6. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  7. COMPUTATIONAL TOXICOLOGY: AN APPROACH FOR PRIORITIZING CHEMICAL RISK ASSESSMENTS

    EPA Science Inventory

    Characterizing toxic effects for industrial chemicals carries the challenge of focusing resources on the greatest potential risks for human health and the environment. The union of molecular modeling, bioinformatics and simulation of complex systems with emerging technologies suc...

  8. Bioinformatic approaches to interrogating vitamin D receptor signaling.

    PubMed

    Campbell, Moray J

    2017-09-15

    Bioinformatics applies unbiased approaches to develop statistically-robust insight into health and disease. At the global, or "20,000 foot" view bioinformatic analyses of vitamin D receptor (NR1I1/VDR) signaling can measure where the VDR gene or protein exerts a genome-wide significant impact on biology; VDR is significantly implicated in bone biology and immune systems, but not in cancer. With a more VDR-centric, or "2000 foot" view, bioinformatic approaches can interrogate events downstream of VDR activity. Integrative approaches can combine VDR ChIP-Seq in cell systems where significant volumes of publically available data are available. For example, VDR ChIP-Seq studies can be combined with genome-wide association studies to reveal significant associations to immune phenotypes. Similarly, VDR ChIP-Seq can be combined with data from Cancer Genome Atlas (TCGA) to infer the impact of VDR target genes in cancer progression. Therefore, bioinformatic approaches can reveal what aspects of VDR downstream networks are significantly related to disease or phenotype. Copyright © 2017 The Author. Published by Elsevier B.V. All rights reserved.

  9. [Factors affecting the adoption of ICT tools in experiments with bioinformatics in biopharmaceutical organizations: a case study in the Brazilian Cancer Institute].

    PubMed

    Pitassi, Claudio; Gonçalves, Antonio Augusto; Moreno Júnior, Valter de Assis

    2014-01-01

    The scope of this article is to identify and analyze the factors that influence the adoption of ICT tools in experiments with bioinformatics at the Brazilian Cancer Institute (INCA). It involves a descriptive and exploratory qualitative field study. Evidence was collected mainly based on in-depth interviews with the management team at the Research Center and the IT Division. The answers were analyzed using the categorical content method. The categories were selected from the scientific literature and consolidated in the Technology-Organization-Environment (TOE) framework created for this study. The model proposed made it possible to demonstrate how the factors selected impacted INCA´s adoption of bioinformatics systems and tools, contributing to the investigation of two critical areas for the development of the health industry in Brazil, namely technological innovation and bioinformatics. Based on the evidence collected, a research question was posed: to what extent can the alignment of the factors related to the adoption of ICT tools in experiments with bioinformatics increase the innovation capacity of a Brazilian biopharmaceutical organization?

  10. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Deorphanizing the human transmembrane genome: A landscape of uncharacterized membrane proteins.

    PubMed

    Babcock, Joseph J; Li, Min

    2014-01-01

    The sequencing of the human genome has fueled the last decade of work to functionally characterize genome content. An important subset of genes encodes membrane proteins, which are the targets of many drugs. They reside in lipid bilayers, restricting their endogenous activity to a relatively specialized biochemical environment. Without a reference phenotype, the application of systematic screens to profile candidate membrane proteins is not immediately possible. Bioinformatics has begun to show its effectiveness in focusing the functional characterization of orphan proteins of a particular functional class, such as channels or receptors. Here we discuss integration of experimental and bioinformatics approaches for characterizing the orphan membrane proteome. By analyzing the human genome, a landscape reference for the human transmembrane genome is provided.

  12. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  13. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.

    PubMed

    Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg

    2013-12-01

    Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.

  15. Bioinformatics and Cancer

    Cancer.gov

    Researchers take on challenges and opportunities to mine "Big Data" for answers to complex biological questions. Learn how bioinformatics uses advanced computing, mathematics, and technological platforms to store, manage, analyze, and understand data.

  16. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

    PubMed Central

    Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

    2012-01-01

    Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background  The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design  The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results  To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions  The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278

  17. Seahawk: moving beyond HTML in Web-based bioinformatics analysis.

    PubMed

    Gordon, Paul M K; Sensen, Christoph W

    2007-06-18

    Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therefore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.

  18. Seahawk: moving beyond HTML in Web-based bioinformatics analysis

    PubMed Central

    Gordon, Paul MK; Sensen, Christoph W

    2007-01-01

    Background Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therfore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. Results We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. Conclusion As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer. PMID:17577405

  19. The International Conference on Intelligent Biology and Medicine (ICIBM) 2016: from big data to big analytical tools.

    PubMed

    Liu, Zhandong; Zheng, W Jim; Allen, Genevera I; Liu, Yin; Ruan, Jianhua; Zhao, Zhongming

    2017-10-03

    The 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) was held on December 8-10, 2016 in Houston, Texas, USA. ICIBM included eight scientific sessions, four tutorials, one poster session, four highlighted talks and four keynotes that covered topics on 3D genomics structural analysis, next generation sequencing (NGS) analysis, computational drug discovery, medical informatics, cancer genomics, and systems biology. Here, we present a summary of the nine research articles selected from ICIBM 2016 program for publishing in BMC Bioinformatics.

  20. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    PubMed Central

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  1. Glossary of bioinformatics terms.

    PubMed

    2007-06-01

    This collection of terms and definitions commonly encountered in the bioinformatics literature will be updated periodically as Current Protocols in Bioinformatics grows. In addition, an extensive glossary of genetic terms can be found on the Web site of the National Human Genome Research Institute (http://www.genome.gov/glossary.cfm). The entries in that online glossary provide a brief written definition of the term; the user can also listen to an informative explanation of the term using RealAudio or the Windows Media Player.

  2. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sundstrom, J.; Tash, B; Murakami, T

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  3. CymeR: cytometry analysis using KNIME, docker and R

    PubMed Central

    Muchmore, B.; Alarcón-Riquelme, M.E.

    2017-01-01

    Abstract Summary: Here we present open-source software for the analysis of high-dimensional cytometry data using state of the art algorithms. Importantly, use of the software requires no programming ability, and output files can either be interrogated directly in CymeR or they can be used downstream with any other cytometric data analysis platform. Also, because we use Docker to integrate the multitude of components that form the basis of CymeR, we have additionally developed a proof-of-concept of how future open-source bioinformatic programs with graphical user interfaces could be developed. Availability and Implementation: CymeR is open-source software that ties several components into a single program that is perhaps best thought of as a self-contained data analysis operating system. Please see https://github.com/bmuchmore/CymeR/wiki for detailed installation instructions. Contact: brian.muchmore@genyo.es or marta.alarcon@genyo.es PMID:27998935

  4. CymeR: cytometry analysis using KNIME, docker and R.

    PubMed

    Muchmore, B; Alarcón-Riquelme, M E

    2017-03-01

    Here we present open-source software for the analysis of high-dimensional cytometry data using state of the art algorithms. Importantly, use of the software requires no programming ability, and output files can either be interrogated directly in CymeR or they can be used downstream with any other cytometric data analysis platform. Also, because we use Docker to integrate the multitude of components that form the basis of CymeR, we have additionally developed a proof-of-concept of how future open-source bioinformatic programs with graphical user interfaces could be developed. CymeR is open-source software that ties several components into a single program that is perhaps best thought of as a self-contained data analysis operating system. Please see https://github.com/bmuchmore/CymeR/wiki for detailed installation instructions. brian.muchmore@genyo.es or marta.alarcon@genyo.es. © The Author 2016. Published by Oxford University Press.

  5. Neuroproteomics and Environmental Chemical-induced Adverse Effects

    EPA Science Inventory

    Technological advances in science have aided the field of neuroproteomics with refined tools for the study of the expression, interaction, and function of proteins in the nervous system. With the aid of bioinformatics, neuroproteomics can reveal the organization of dynamic, funct...

  6. Understanding Chemical-induced Adverse Effects with Neuroproteins

    EPA Science Inventory

    Technological advances in science have aided the field of neuroproteomics with refined tools for the study of the expression, interaction, and function of proteins in the nervous system. With the aid of bioinformatics, neuroproteomics can reveal the organization of dynamic, funct...

  7. On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

    PubMed Central

    Calcagno, Cristina; Coppo, Mario

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327

  8. On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

    PubMed

    Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

  9. BioPCD - A Language for GUI Development Requiring a Minimal Skill Set.

    PubMed

    Alvare, Graham Gm; Roche-Lima, Abiel; Fristensky, Brian

    2012-11-01

    BioPCD is a new language whose purpose is to simplify the creation of Graphical User Interfaces (GUIs) by biologists with minimal programming skills. The first step in developing BioPCD was to create a minimal superset of the language referred to as PCD (Pythonesque Command Description). PCD defines the core of terminals and high-level nonterminals required to describe data of almost any type. BioPCD adds to PCD the constructs necessary to describe GUI components and the syntax for executing system commands. BioPCD is implemented using JavaCC to convert the grammar into code. BioPCD is designed to be terse and readable and simple enough to be learned by copying and modifying existing BioPCD files. We demonstrate that BioPCD can easily be used to generate GUIs for existing command line programs. Although BioPCD was designed to make it easier to run bioinformatics programs, it could be used in any domain in which many useful command line programs exist that do not have GUI interfaces.

  10. Senior Computational Scientist | Center for Cancer Research

    Cancer.gov

    The Basic Science Program (BSP) pursues independent, multidisciplinary research in basic and applied molecular biology, immunology, retrovirology, cancer biology, and human genetics. Research efforts and support are an integral part of the Center for Cancer Research (CCR) at the Frederick National Laboratory for Cancer Research (FNLCR). The Cancer & Inflammation Program (CIP), Basic Science Program, HLA Immunogenetics Section, under the leadership of Dr. Mary Carrington, studies the influence of human leukocyte antigens (HLA) and specific KIR/HLA genotypes on risk of and outcomes to infection, cancer, autoimmune disease, and maternal-fetal disease. Recent studies have focused on the impact of HLA gene expression in disease, the molecular mechanism regulating expression levels, and the functional basis for the effect of differential expression on disease outcome. The lab’s further focus is on the genetic basis for resistance/susceptibility to disease conferred by immunogenetic variation. KEY ROLES/RESPONSIBILITIES The Senior Computational Scientist will provide research support to the CIP-BSP-HLA Immunogenetics Section performing bio-statistical design, analysis and reporting of research projects conducted in the lab. This individual will be involved in the implementation of statistical models and data preparation. Successful candidate should have 5 or more years of competent, innovative biostatistics/bioinformatics research experience, beyond doctoral training Considerable experience with statistical software, such as SAS, R and S-Plus Sound knowledge, and demonstrated experience of theoretical and applied statistics Write program code to analyze data using statistical analysis software Contribute to the interpretation and publication of research results

  11. Bioinformatics in the Netherlands: the value of a nationwide community.

    PubMed

    van Gelder, Celia W G; Hooft, Rob W W; van Rijswijk, Merlijn N; van den Berg, Linda; Kok, Ruben G; Reinders, Marcel; Mons, Barend; Heringa, Jaap

    2017-09-15

    This review provides a historical overview of the inception and development of bioinformatics research in the Netherlands. Rooted in theoretical biology by foundational figures such as Paulien Hogeweg (at Utrecht University since the 1970s), the developments leading to organizational structures supporting a relatively large Dutch bioinformatics community will be reviewed. We will show that the most valuable resource that we have built over these years is the close-knit national expert community that is well engaged in basic and translational life science research programmes. The Dutch bioinformatics community is accustomed to facing the ever-changing landscape of data challenges and working towards solutions together. In addition, this community is the stable factor on the road towards sustainability, especially in times where existing funding models are challenged and change rapidly. © The Author 2017. Published by Oxford University Press.

  12. Generations of interdisciplinarity in bioinformatics

    PubMed Central

    Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L.

    2016-01-01

    Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this “borderland.” As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature. PMID:27453689

  13. BioContainers: an open-source and community-driven framework for software standardization.

    PubMed

    da Veiga Leprevost, Felipe; Grüning, Björn A; Alves Aflitos, Saulo; Röst, Hannes L; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I; Perez-Riverol, Yasset

    2017-08-15

    BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). The software is freely available at github.com/BioContainers/. yperez@ebi.ac.uk. © The Author(s) 2017. Published by Oxford University Press.

  14. BioContainers: an open-source and community-driven framework for software standardization

    PubMed Central

    da Veiga Leprevost, Felipe; Grüning, Björn A.; Alves Aflitos, Saulo; Röst, Hannes L.; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C.; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I.; Perez-Riverol, Yasset

    2017-01-01

    Abstract Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/. Contact yperez@ebi.ac.uk PMID:28379341

  15. InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia

    PubMed Central

    2011-01-01

    In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia. PMID:22369160

  16. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  17. Bioinformatics training: selecting an appropriate learning content management system--an example from the European Bioinformatics Institute.

    PubMed

    Wright, Victoria Ann; Vaughan, Brendan W; Laurent, Thomas; Lopez, Rodrigo; Brooksbank, Cath; Schneider, Maria Victoria

    2010-11-01

    Today's molecular life scientists are well educated in the emerging experimental tools of their trade, but when it comes to training on the myriad of resources and tools for dealing with biological data, a less ideal situation emerges. Often bioinformatics users receive no formal training on how to make the most of the bioinformatics resources and tools available in the public domain. The European Bioinformatics Institute, which is part of the European Molecular Biology Laboratory (EMBL-EBI), holds the world's most comprehensive collection of molecular data, and training the research community to exploit this information is embedded in the EBI's mission. We have evaluated eLearning, in parallel with face-to-face courses, as a means of training users of our data resources and tools. We anticipate that eLearning will become an increasingly important vehicle for delivering training to our growing user base, so we have undertaken an extensive review of Learning Content Management Systems (LCMSs). Here, we describe the process that we used, which considered the requirements of trainees, trainers and systems administrators, as well as taking into account our organizational values and needs. This review describes the literature survey, user discussions and scripted platform testing that we performed to narrow down our choice of platform from 36 to a single platform. We hope that it will serve as guidance for others who are seeking to incorporate eLearning into their bioinformatics training programmes.

  18. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  19. Designing a course model for distance-based online bioinformatics training in Africa: The H3ABioNet experience

    PubMed Central

    Panji, Sumir; Fernandes, Pedro L.; Judge, David P.; Ghouila, Amel; Salifu, Samson P.; Ahmed, Rehab; Kayondo, Jonathan; Ssemwanga, Deogratius

    2017-01-01

    Africa is not unique in its need for basic bioinformatics training for individuals from a diverse range of academic backgrounds. However, particular logistical challenges in Africa, most notably access to bioinformatics expertise and internet stability, must be addressed in order to meet this need on the continent. H3ABioNet (www.h3abionet.org), the Pan African Bioinformatics Network for H3Africa, has therefore developed an innovative, free-of-charge “Introduction to Bioinformatics” course, taking these challenges into account as part of its educational efforts to provide on-site training and develop local expertise inside its network. A multiple-delivery–mode learning model was selected for this 3-month course in order to increase access to (mostly) African, expert bioinformatics trainers. The content of the course was developed to include a range of fundamental bioinformatics topics at the introductory level. For the first iteration of the course (2016), classrooms with a total of 364 enrolled participants were hosted at 20 institutions across 10 African countries. To ensure that classroom success did not depend on stable internet, trainers pre-recorded their lectures, and classrooms downloaded and watched these locally during biweekly contact sessions. The trainers were available via video conferencing to take questions during contact sessions, as well as via online “question and discussion” forums outside of contact session time. This learning model, developed for a resource-limited setting, could easily be adapted to other settings. PMID:28981516

  20. KAnalyze: a fast versatile pipelined K-mer toolkit

    PubMed Central

    Audano, Peter; Vannberg, Fredrik

    2014-01-01

    Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. Availability and implementation: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/ Contact: fredrik.vannberg@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24642064

  1. Bioengineering and Bioinformatics Summer Institutes: Meeting Modern Challenges in Undergraduate Summer Research

    PubMed Central

    Dong, Cheng; Snyder, Alan J.; Jones, A. Daniel; Sheets, Erin D.

    2008-01-01

    Summer undergraduate research programs in science and engineering facilitate research progress for faculty and provide a close-ended research experience for students, which can prepare them for careers in industry, medicine, and academia. However, ensuring these outcomes is a challenge when the students arrive ill-prepared for substantive research or if projects are ill-defined or impractical for a typical 10-wk summer. We describe how the new Bioengineering and Bioinformatics Summer Institutes (BBSI), developed in response to a call for proposals by the National Institutes of Health (NIH) and the National Science Foundation (NSF), provide an impetus for the enhancement of traditional undergraduate research experiences with intense didactic training in particular skills and technologies. Such didactic components provide highly focused and qualified students for summer research with the goal of ensuring increased student satisfaction with research and mentor satisfaction with student productivity. As an example, we focus on our experiences with the Penn State Biomaterials and Bionanotechnology Summer Institute (PSU-BBSI), which trains undergraduates in core technologies in surface characterization, computational modeling, cell biology, and fabrication to prepare them for student-centered research projects in the role of materials in guiding cell biology. PMID:18316807

  2. DASS: efficient discovery and p-value calculation of substructures in unordered data.

    PubMed

    Hollunder, Jens; Friedel, Maik; Beyer, Andreas; Workman, Christopher T; Wilhelm, Thomas

    2007-01-01

    Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS

  3. Two C++ Libraries for Counting Trees on a Phylogenetic Terrace.

    PubMed

    Biczok, R; Bozsoky, P; Eisenmann, P; Ernst, J; Ribizel, T; Scholz, F; Trefzer, A; Weber, F; Hamann, M; Stamatakis, A

    2018-05-08

    The presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al. (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the lack of an efficient library to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace. In our bioinformatics practical that is set up as a programming contest we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating trees on a terrace. Both implementations yield exactly the same results, are more than one order of magnitude faster, and require one order of magnitude less memory than a previous 3rd party python implementation. The source codes are available under GNU GPL at https://github.com/terraphast. Alexandros.Stamatakis@h-its.org. Supplementary data are available at Bioinformatics online.

  4. Genomics and bioinformatics in undergraduate curricula: Contexts for hybrid laboratory/lecture courses for entering and advanced science students.

    PubMed

    Temple, Louise; Cresawn, Steven G; Monroe, Jonathan D

    2010-01-01

    Emerging interest in genomics in the scientific community prompted biologists at James Madison University to create two courses at different levels to modernize the biology curriculum. The courses are hybrids of classroom and laboratory experiences. An upper level class uses raw sequence of a genome (plasmid or virus) as the subject on which to base the experience of genomic analysis. Students also learn bioinformatics and software programs needed to support a project linking structure and function in proteins and showing evolutionary relatedness of similar genes. An optional entry-level course taken in addition to the required first-year curriculum and sponsored in part by the Howard Hughes Medical Institute, engages first year students in a primary research project. In the first semester, they isolate and characterize novel bacteriophages that infect soil bacteria. In the second semester, these young scientists annotate the genes on one or more of the unique viruses they discovered. These courses are demanding but exciting for both faculty and students and should be accessible to any interested faculty member. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.

  5. COMAN: a web server for comprehensive metatranscriptomics analysis.

    PubMed

    Ni, Yueqiong; Li, Jun; Panagiotou, Gianni

    2016-08-11

    Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming. We developed COMAN (Comprehensive Metatranscriptomics Analysis), a web-based tool dedicated to automatically and comprehensively analysing metatranscriptomic data. COMAN pipeline includes quality control of raw reads, removal of reads derived from non-coding RNA, followed by functional annotation, comparative statistical analysis, pathway enrichment analysis, co-expression network analysis and high-quality visualisation. The essential data generated by COMAN are also provided in tabular format for additional analysis and integration with other software. The web server has an easy-to-use interface and detailed instructions, and is freely available at http://sbb.hku.hk/COMAN/ CONCLUSIONS: COMAN is an integrated web server dedicated to comprehensive functional analysis of metatranscriptomic data, translating massive amount of reads to data tables and high-standard figures. It is expected to facilitate the researchers with less expertise in bioinformatics in answering microbiota-related biological questions and to increase the accessibility and interpretation of microbiota RNA-Seq data.

  6. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase.

    PubMed

    Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong

    2005-01-01

    Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.

  7. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    PubMed Central

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

    2018-01-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625

  8. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

    PubMed

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

    2018-02-27

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  9. Opportunities at the Intersection of Bioinformatics and Health Informatics

    PubMed Central

    Miller, Perry L.

    2000-01-01

    This paper provides a “viewpoint discussion” based on a presentation made to the 2000 Symposium of the American College of Medical Informatics. It discusses potential opportunities for researchers in health informatics to become involved in the rapidly growing field of bioinformatics, using the activities of the Yale Center for Medical Informatics as a case study. One set of opportunities occurs where bioinformatics research itself intersects with the clinical world. Examples include the correlations between individual genetic variation with clinical risk factors, disease presentation, and differential response to treatment; and the implications of including genetic test results in the patient record, which raises clinical decision support issues as well as legal and ethical issues. A second set of opportunities occurs where bioinformatics research can benefit from the technologic expertise and approaches that informaticians have used extensively in the clinical arena. Examples include database organization and knowledge representation, data mining, and modeling and simulation. Microarray technology is discussed as a specific potential area for collaboration. Related questions concern how best to establish collaborations with bioscientists so that the interests and needs of both sets of researchers can be met in a synergistic fashion, and the most appropriate home for bioinformatics in an academic medical center. PMID:10984461

  10. Mathematics and evolutionary biology make bioinformatics education comprehensible.

    PubMed

    Jungck, John R; Weisstein, Anton E

    2013-09-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.

  11. Mathematics and evolutionary biology make bioinformatics education comprehensible

    PubMed Central

    Weisstein, Anton E.

    2013-01-01

    The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621

  12. Nanoinformatics: an emerging area of information technology at the intersection of bioinformatics, computational chemistry and nanobiotechnology.

    PubMed

    González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E

    2011-01-01

    After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.

  13. Wenyi Wang, Statistical Bioinformatics Expert, Visits DCEG

    Cancer.gov

    In March 2018, Wenyi Wang, Ph.D., Associate Professor in the Department of Bioinformatics and Computational Biology at the University of Texas MD Anderson Cancer Center, visited DCEG to give a seminar and meet with staff.

  14. EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow.

    PubMed

    Magalhães, Wagner C S; Araujo, Nathalia M; Leal, Thiago P; Araujo, Gilderlanio S; Viriato, Paula J S; Kehdy, Fernanda S; Costa, Gustavo N; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Tarazona-Santos, Eduardo; Rodrigues, Maíra R

    2018-06-14

    EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs ( info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources. © 2018 Magalhães et al.; Published by Cold Spring Harbor Laboratory Press.

  15. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    PubMed

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  16. Integration of Ethics across the Curriculum: From First Year through Senior Seminar†

    PubMed Central

    Gasparich, Gail E.; Wimmers, Larry

    2014-01-01

    The Fisher College of Science and Mathematics (FCSM) at Towson University (TU) has integrated authentic research experiences throughout the curriculum from first year STEM courses through advanced upper-level classes and independent research. Our observation is that training in both responsible conduct of research (RCR) and bioethics throughout the curriculum was an effective strategy to advance the cognitive and psychosocial development of the students. As students enter TU they generally lack the experience and tools to assess their own competence, to apply ethical debates, to investigate scientific topics from an ethical perspective, or to integrate ethics into final conclusions. Student behavior and development follow cognitive models such as described in the theories put forth by Piaget, Kohlberg, and Erikson, both for initial learning and for how concepts are understood and adopted. Three examples of this ethics training integration are described, including a cohort-based course for first year students in the STEM Residential Learning Community, a cohort-based course for community college students that are involved in an NIH-funded Bridges to the Baccalaureate program, and a senior seminar in Bioethics in the Molecular Biology, Biochemistry and Bioinformatics Program. All three focus on different aspects of RCR and bioethics training, providing opportunities for students to learn about the principles of effective decision-making, critical and analytical thinking, problem solving, and communication with increasing degrees of complexity as they move through the curriculum. PMID:25574282

  17. Integration of Ethics across the Curriculum: From First Year through Senior Seminar.

    PubMed

    Gasparich, Gail E; Wimmers, Larry

    2014-12-01

    The Fisher College of Science and Mathematics (FCSM) at Towson University (TU) has integrated authentic research experiences throughout the curriculum from first year STEM courses through advanced upper-level classes and independent research. Our observation is that training in both responsible conduct of research (RCR) and bioethics throughout the curriculum was an effective strategy to advance the cognitive and psychosocial development of the students. As students enter TU they generally lack the experience and tools to assess their own competence, to apply ethical debates, to investigate scientific topics from an ethical perspective, or to integrate ethics into final conclusions. Student behavior and development follow cognitive models such as described in the theories put forth by Piaget, Kohlberg, and Erikson, both for initial learning and for how concepts are understood and adopted. Three examples of this ethics training integration are described, including a cohort-based course for first year students in the STEM Residential Learning Community, a cohort-based course for community college students that are involved in an NIH-funded Bridges to the Baccalaureate program, and a senior seminar in Bioethics in the Molecular Biology, Biochemistry and Bioinformatics Program. All three focus on different aspects of RCR and bioethics training, providing opportunities for students to learn about the principles of effective decision-making, critical and analytical thinking, problem solving, and communication with increasing degrees of complexity as they move through the curriculum.

  18. ETE: a python Environment for Tree Exploration.

    PubMed

    Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

    2010-01-13

    Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  19. Generational comparisons (F1 versus F3) of vinclozolin induced epigenetic transgenerational inheritance of sperm differential DNA methylation regions (epimutations) using MeDIP-Seq.

    PubMed

    Beck, Daniel; Sadler-Riggleman, Ingrid; Skinner, Michael K

    2017-07-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation has been shown to involve DNA methylation alterations in the germline (e.g. sperm). These differential DNA methylation regions (DMRs) are termed epimutations and in part transmit the transgenerational phenotypes. The agricultural fungicide vinclozolin exposure of a gestating female rat has previously been shown to promote transgenerational disease and epimutations in F3 generation (great-grand-offspring) animals. The current study was designed to investigate the actions of direct fetal exposure on the F1 generation rat sperm DMRs compared to the F3 transgenerational sperm DMRs. A protocol involving methylated DNA immunoprecipitation (MeDIP) followed by next-generation sequencing (Seq) was used in the current study. Bioinformatics analysis of the MeDIP-Seq data was developed and several different variations in the bioinformatic analysis were evaluated. Observations indicate needs to be considered. Interestingly, the F1 generation DMRs were found to be fewer in number and for the most part distinct from the F3 generation epimutations. Observations suggest the direct exposure induced F1 generation sperm DMRs appear to promote in subsequent generations alterations in the germ cell developmental programming that leads to the distinct epimutations in the F3 generation. This may help explain the differences in disease and phenotypes between the direct exposure F1 generation and transgenerational F3 generation. Observations demonstrate a distinction between the direct exposure versus transgenerational epigenetic programming induced by environmental exposures and provide insights into the molecular mechanisms involved in the epigenetic transgenerational inheritance phenomenon.

  20. ETE: a python Environment for Tree Exploration

    PubMed Central

    2010-01-01

    Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885

Top