An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences
2009-02-23
AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings
Bioinformatics in protein kinases regulatory network and drug discovery.
Chen, Qingfeng; Luo, Haiqiong; Zhang, Chengqi; Chen, Yi-Ping Phoebe
2015-04-01
Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry. Copyright © 2015 Elsevier Inc. All rights reserved.
Protein Bioinformatics Databases and Resources
Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.
2017-01-01
Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis
2014-11-15
The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A Virtual Bioinformatics Knowledge Environment for Early Cancer Detection
NASA Technical Reports Server (NTRS)
Crichton, Daniel; Srivastava, Sudhir; Johnsey, Donald
2003-01-01
Discovery of disease biomarkers for cancer is a leading focus of early detection. The National Cancer Institute created a network of collaborating institutions focused on the discovery and validation of cancer biomarkers called the Early Detection Research Network (EDRN). Informatics plays a key role in enabling a virtual knowledge environment that provides scientists real time access to distributed data sets located at research institutions across the nation. The distributed and heterogeneous nature of the collaboration makes data sharing across institutions very difficult. EDRN has developed a comprehensive informatics effort focused on developing a national infrastructure enabling seamless access, sharing and discovery of science data resources across all EDRN sites. This paper will discuss the EDRN knowledge system architecture, its objectives and its accomplishments.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi
2018-01-01
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi
2018-02-27
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
Bio-TDS: bioscience query tool discovery system.
Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M
2017-01-04
Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bioinformatics in translational drug discovery.
Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G
2017-08-31
Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).
The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery
2014-01-01
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org. PMID:24602174
Bioinformatics education in India.
Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas
2010-11-01
An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.
A Ramble through the Cell: How Can We Clear Such a Complicated Trail?
ERIC Educational Resources Information Center
Bobich, Joseph A.
2006-01-01
The arrangement of course information in a logical sequence for molecular life science (MLS) courses remains a matter of some controversy, even within a single subdiscipline such as biochemistry. This is due to the explosion of knowledge, the latest bioinformatic revelations, and the observation that new discoveries sometimes reveal specific…
A collaborative filtering-based approach to biomedical knowledge discovery.
Lever, Jake; Gakkhar, Sitanshu; Gottlieb, Michael; Rashnavadi, Tahereh; Lin, Santina; Siu, Celia; Smith, Maia; Jones, Martin R; Krzywinski, Martin; Jones, Steven J M; Wren, Jonathan
2018-02-15
The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers. We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations. All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery. sjones@bcgsc.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
2010-01-01
Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong
2010-01-18
The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
Discovery of novel bacterial toxins by genomics and computational biology.
Doxey, Andrew C; Mansfield, Michael J; Montecucco, Cesare
2018-06-01
Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes. Copyright © 2018. Published by Elsevier Ltd.
Hassani-Pak, Keywan; Rawlings, Christopher
2017-06-13
Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Cellular automata and its applications in protein bioinformatics.
Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen
2011-09-01
With the explosion of protein sequences generated in the postgenomic era, it is highly desirable to develop high-throughput tools for rapidly and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery. Many bioinformatics tools have been developed by means of machine learning methods. This review is focused on the applications of a new kind of science (cellular automata) in protein bioinformatics. A cellular automaton (CA) is an open, flexible and discrete dynamic model that holds enormous potentials in modeling complex systems, in spite of the simplicity of the model itself. Researchers, scientists and practitioners from different fields have utilized cellular automata for visualizing protein sequences, investigating their evolution processes, and predicting their various attributes. Owing to its impressive power, intuitiveness and relative simplicity, the CA approach has great potential for use as a tool for bioinformatics.
Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C
2018-06-01
There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.
Open discovery: An integrated live Linux platform of Bioinformatics tools.
Vetrivel, Umashankar; Pilla, Kalabharath
2008-01-01
Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery - a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in.
Translational bioinformatics: linking the molecular world to the clinical world.
Altman, R B
2012-06-01
Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.
Open discovery: An integrated live Linux platform of Bioinformatics tools
Vetrivel, Umashankar; Pilla, Kalabharath
2008-01-01
Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery ‐ a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. Availability The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in PMID:19238235
Integration of cardiac proteome biology and medicine by a specialized knowledgebase.
Zong, Nobel C; Li, Haomin; Li, Hua; Lam, Maggie P Y; Jimenez, Rafael C; Kim, Christina S; Deng, Ning; Kim, Allen K; Choi, Jeong Ho; Zelaya, Ivette; Liem, David; Meyer, David; Odeberg, Jacob; Fang, Caiyun; Lu, Hao-Jie; Xu, Tao; Weiss, James; Duan, Huilong; Uhlen, Mathias; Yates, John R; Apweiler, Rolf; Ge, Junbo; Hermjakob, Henning; Ping, Peipei
2013-10-12
Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts. The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine. We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest. COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.
Ramharack, Pritika; Soliman, Mahmoud E S
2018-06-01
Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.
X-ray crystallography over the past decade for novel drug discovery - where are we heading next?
Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek
2015-01-01
Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible.
USDA-ARS?s Scientific Manuscript database
Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficu...
X-ray crystallography over the past decade for novel drug discovery – where are we heading next?
Zheng, Heping; Handing, Katarzyna B; Zimmerman, Matthew D; Shabalin, Ivan G; Almo, Steven C; Minor, Wladek
2015-01-01
Introduction Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. Areas covered This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. Expert opinion X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible. PMID:26177814
Lipidomics from an analytical perspective.
Sandra, Koen; Sandra, Pat
2013-10-01
The global non-targeted analysis of various biomolecules in a variety of sample sources gained momentum in recent years. Defined as the study of the full lipid complement of cells, tissues and organisms, lipidomics is currently evolving out of the shadow of the more established omics sciences including genomics, transcriptomics, proteomics and metabolomics. In analogy to the latter, lipidomics has the potential to impact on biomarker discovery, drug discovery/development and system knowledge, amongst others. The tools developed by lipid researchers in the past, complemented with the enormous advancements made in recent years in mass spectrometry and chromatography, and the implementation of sophisticated (bio)-informatics tools form the basis of current lipidomics technologies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Collection, Culturing, and Genome Analyses of Tropical Marine Filamentous Benthic Cyanobacteria.
Moss, Nathan A; Leao, Tiago; Glukhov, Evgenia; Gerwick, Lena; Gerwick, William H
2018-01-01
Decreasing sequencing costs has sparked widespread investigation of the use of microbial genomics to accelerate the discovery and development of natural products for therapeutic uses. Tropical marine filamentous cyanobacteria have historically produced many structurally novel natural products, and therefore present an excellent opportunity for the systematic discovery of new metabolites via the information derived from genomics and molecular genetics. Adequate knowledge transfer and institutional know-how are important to maintain the capability for studying filamentous cyanobacteria due to their unusual microbial morphology and characteristics. Here, we describe workflows, procedures, and commentary on sample collection, cultivation, genomic DNA generation, bioinformatics tools, and biosynthetic pathway analysis concerning filamentous cyanobacteria. © 2018 Elsevier Inc. All rights reserved.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
Yan, Qing
2010-01-01
Bioinformatics is the rational study at an abstract level that can influence the way we understand biomedical facts and the way we apply the biomedical knowledge. Bioinformatics is facing challenges in helping with finding the relationships between genetic structures and functions, analyzing genotype-phenotype associations, and understanding gene-environment interactions at the systems level. One of the most important issues in bioinformatics is data integration. The data integration methods introduced here can be used to organize and integrate both public and in-house data. With the volume of data and the high complexity, computational decision support is essential for integrative transporter studies in pharmacogenomics, nutrigenomics, epigenetics, and systems biology. For the development of such a decision support system, object-oriented (OO) models can be constructed using the Unified Modeling Language (UML). A methodology is developed to build biomedical models at different system levels and construct corresponding UML diagrams, including use case diagrams, class diagrams, and sequence diagrams. By OO modeling using UML, the problems of transporter pharmacogenomics and systems biology can be approached from different angles with a more complete view, which may greatly enhance the efforts in effective drug discovery and development. Bioinformatics resources of membrane transporters and general bioinformatics databases and tools that are frequently used in transporter studies are also collected here. An informatics decision support system based on the models presented here is available at http://www.pharmtao.com/transporter . The methodology developed here can also be used for other biomedical fields.
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
Bioinformatic perspectives on NRPS/PKS megasynthases: advances and challenges.
Jenke-Kodama, Holger; Dittmann, Elke
2009-07-01
The increased understanding of both fundamental principles and mechanistic variations of NRPS/PKS megasynthases along with the unprecedented availability of microbial sequences has inspired a number of in silico studies of both enzyme families. The insights that can be extracted from these analyses go far beyond a rough classification of data and have turned bioinformatics into a frontier field of natural products research. As databases are flooded with NRPS/PKS gene sequence of microbial genomes and metagenomes, increasingly reliable structural prediction methods can help to uncover hidden treasures. Already, phylogenetic analyses have revealed that NRPS/PKS pathways should not simply be regarded as enzyme complexes, specifically evolved to product a selected natural product. Rather, they represent a collection of genetic opinions, allowing biosynthetic pathways to be shuffled in a process of perpetual chemical innovations and pathways diversification in nature can give impulses for specificities, protein interactions and genetic engineering of libraries of novel peptides and polyketides. The successful translation of the knowledge obtained from bioinformatic dissection of NRPS/PKS megasynthases into new techniques for drug discovery and design remain challenges for the future.
An Integrative Bioinformatics Approach for Knowledge Discovery
NASA Astrophysics Data System (ADS)
Peña-Castillo, Lourdes; Phan, Sieu; Famili, Fazel
The vast amount of data being generated by large scale omics projects and the computational approaches developed to deal with this data have the potential to accelerate the advancement of our understanding of the molecular basis of genetic diseases. This better understanding may have profound clinical implications and transform the medical practice; for instance, therapeutic management could be prescribed based on the patient’s genetic profile instead of being based on aggregate data. Current efforts have established the feasibility and utility of integrating and analysing heterogeneous genomic data to identify molecular associations to pathogenesis. However, since these initiatives are data-centric, they either restrict the research community to specific data sets or to a certain application domain, or force researchers to develop their own analysis tools. To fully exploit the potential of omics technologies, robust computational approaches need to be developed and made available to the community. This research addresses such challenge and proposes an integrative approach to facilitate knowledge discovery from diverse datasets and contribute to the advancement of genomic medicine.
[Application of bioinformatics in researches of industrial biocatalysis].
Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao
2004-05-01
Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.
Discovery of the leinamycin family of natural products by mining actinobacterial genomes
Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen
2017-01-01
Nature’s ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF–SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF–SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm-type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature’s rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature’s biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity. PMID:29229819
Discovery of the leinamycin family of natural products by mining actinobacterial genomes.
Pan, Guohui; Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Yang, Dong; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen; Shen, Ben
2017-12-26
Nature's ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF-SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF-SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm -type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature's rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature's biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity.
Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C
2016-01-01
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
Irizarry, Kristopher J. L.; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L.; Barrett, Gini; Barr, Margaret C.
2016-01-01
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management. PMID:27376076
The web server of IBM's Bioinformatics and Pattern Discovery group.
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-07-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
The web server of IBM's Bioinformatics and Pattern Discovery group
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-01-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12824385
Ontology-Based Search of Genomic Metadata.
Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano
2016-01-01
The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.
Seahawk: moving beyond HTML in Web-based bioinformatics analysis.
Gordon, Paul M K; Sensen, Christoph W
2007-06-18
Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therefore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.
Seahawk: moving beyond HTML in Web-based bioinformatics analysis
Gordon, Paul MK; Sensen, Christoph W
2007-01-01
Background Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therfore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. Results We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. Conclusion As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer. PMID:17577405
Chapter 1: Biomedical knowledge integration.
Payne, Philip R O
2012-01-01
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.
Chapter 1: Biomedical Knowledge Integration
Payne, Philip R. O.
2012-01-01
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems. PMID:23300416
Visualising "Junk" DNA through Bioinformatics
ERIC Educational Resources Information Center
Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia
2005-01-01
One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…
Anthelmintics: From discovery to resistance II (San Diego, 2016).
Martin, Richard J; Wolstenholme, Adrian J; Caffrey, Conor R
2016-12-01
The second scientific meeting in the series: "Anthelmintics: From Discovery to Resistance" was held in San Diego in February, 2016. The focus topics of the meeting, related to anthelmintic discovery and resistance, were novel technologies, bioinformatics, commercial interests, anthelmintic modes of action and anthelmintic resistance. Basic scientific, human and veterinary interests were addressed in oral and poster presentations. The delegates were from universities and industries in the US, Europe, Australia and New Zealand. The papers were a great representation of the field, and included the use of C. elegans for lead discovery, mechanisms of anthelmintic resistance, nematode neuropeptides, proteases, B. thuringiensis crystal protein, nicotinic receptors, emodepside, benzimidazoles, P-glycoproteins, natural products, microfluidic techniques and bioinformatics approaches. The NIH also presented NIAID-specific parasite genomic priorities and initiatives. From these papers we introduce below selected papers with a focus on anthelmintic drug screening and development. Copyright © 2016. Published by Elsevier Ltd.
Insights into Antimicrobial Peptides from Spiders and Scorpions
Wang, Xiuqing; Wang, Guangshun
2015-01-01
The venoms of spiders and scorpions contain a variety of chemical compounds. Antimicrobial peptides (AMPs) from these organisms were first discovered in the 1990s. As of May 2015, there were 42 spider’s and 63 scorpion’s AMPs in the Antimicrobial Peptide Database (http://aps.unmc.edu/AP). These peptides have demonstrated broad or narrow-spectrum activities against bacteria, fungi, viruses, and parasites. In addition, they can be toxic to cancer cells, insects and erythrocytes. To provide insight into such an activity spectrum, this article discusses the discovery, classification, structure and activity relationships, bioinformatics analysis, and potential applications of spider and scorpion AMPs. Our analysis reveals that, in the case of linear peptides, spiders use both glycine-rich and helical peptide models for defense, whereas scorpions use two distinct helical peptide models with different amino acid compositions to exert the observed antimicrobial activities and hemolytic toxicity. Our structural bioinformatics study improves the knowledge in the field and can be used to design more selective peptides to combat tumors, parasites, and viruses. PMID:27165405
Current Developments in Machine Learning Techniques in Biological Data Mining.
Dumancas, Gerard G; Adrianto, Indra; Bello, Ghalib; Dozmorov, Mikhail
2017-01-01
This supplement is intended to focus on the use of machine learning techniques to generate meaningful information on biological data. This supplement under Bioinformatics and Biology Insights aims to provide scientists and researchers working in this rapid and evolving field with online, open-access articles authored by leading international experts in this field. Advances in the field of biology have generated massive opportunities to allow the implementation of modern computational and statistical techniques. Machine learning methods in particular, a subfield of computer science, have evolved as an indispensable tool applied to a wide spectrum of bioinformatics applications. Thus, it is broadly used to investigate the underlying mechanisms leading to a specific disease, as well as the biomarker discovery process. With a growth in this specific area of science comes the need to access up-to-date, high-quality scholarly articles that will leverage the knowledge of scientists and researchers in the various applications of machine learning techniques in mining biological data.
Web-based services for drug design and discovery.
Frey, Jeremy G; Bird, Colin L
2011-09-01
Reviews of the development of drug discovery through the 20(th) century recognised the importance of chemistry and increasingly bioinformatics, but had relatively little to say about the importance of computing and networked computing in particular. However, the design and discovery of new drugs is arguably the most significant single application of bioinformatics and cheminformatics to have benefitted from the increases in the range and power of the computational techniques since the emergence of the World Wide Web, commonly now referred to as simply 'the Web'. Web services have enabled researchers to access shared resources and to deploy standardized calculations in their search for new drugs. This article first considers the fundamental principles of Web services and workflows, and then explores the facilities and resources that have evolved to meet the specific needs of chem- and bio-informatics. This strategy leads to a more detailed examination of the basic components that characterise molecules and the essential predictive techniques, followed by a discussion of the emerging networked services that transcend the basic provisions, and the growing trend towards embracing modern techniques, in particular the Semantic Web. In the opinion of the authors, the issues that require community action are: increasing the amount of chemical data available for open access; validating the data as provided; and developing more efficient links between the worlds of cheminformatics and bioinformatics. The goal is to create ever better drug design services.
Semantic Data Integration and Knowledge Management to Represent Biological Network Associations.
Losko, Sascha; Heumann, Klaus
2017-01-01
The vast quantities of information generated by academic and industrial research groups are reflected in a rapidly growing body of scientific literature and exponentially expanding resources of formalized data, including experimental data, originating from a multitude of "-omics" platforms, phenotype information, and clinical data. For bioinformatics, the challenge remains to structure this information so that scientists can identify relevant information, to integrate this information as specific "knowledge bases," and to formalize this knowledge across multiple scientific domains to facilitate hypothesis generation and validation. Here we report on progress made in building a generic knowledge management environment capable of representing and mining both explicit and implicit knowledge and, thus, generating new knowledge. Risk management in drug discovery and clinical research is used as a typical example to illustrate this approach. In this chapter we introduce techniques and concepts (such as ontologies, semantic objects, typed relationships, contexts, graphs, and information layers) that are used to represent complex biomedical networks. The BioXM™ Knowledge Management Environment is used as an example to demonstrate how a domain such as oncology is represented and how this representation is utilized for research.
Integrated Approaches to Drug Discovery for Oxidative Stress-Related Retinal Diseases.
Nishimura, Yuhei; Hara, Hideaki
2016-01-01
Excessive oxidative stress induces dysregulation of functional networks in the retina, resulting in retinal diseases such as glaucoma, age-related macular degeneration, and diabetic retinopathy. Although various therapies have been developed to reduce oxidative stress in retinal diseases, most have failed to show efficacy in clinical trials. This may be due to oversimplification of target selection for such a complex network as oxidative stress. Recent advances in high-throughput technologies have facilitated the collection of multilevel omics data, which has driven growth in public databases and in the development of bioinformatics tools. Integration of the knowledge gained from omics databases can be used to generate disease-related biological networks and to identify potential therapeutic targets within the networks. Here, we provide an overview of integrative approaches in the drug discovery process and provide simple examples of how the approaches can be exploited to identify oxidative stress-related targets for retinal diseases.
Integrated Approaches to Drug Discovery for Oxidative Stress-Related Retinal Diseases
Hara, Hideaki
2016-01-01
Excessive oxidative stress induces dysregulation of functional networks in the retina, resulting in retinal diseases such as glaucoma, age-related macular degeneration, and diabetic retinopathy. Although various therapies have been developed to reduce oxidative stress in retinal diseases, most have failed to show efficacy in clinical trials. This may be due to oversimplification of target selection for such a complex network as oxidative stress. Recent advances in high-throughput technologies have facilitated the collection of multilevel omics data, which has driven growth in public databases and in the development of bioinformatics tools. Integration of the knowledge gained from omics databases can be used to generate disease-related biological networks and to identify potential therapeutic targets within the networks. Here, we provide an overview of integrative approaches in the drug discovery process and provide simple examples of how the approaches can be exploited to identify oxidative stress-related targets for retinal diseases. PMID:28053689
Application of bioinformatics tools and databases in microbial dehalogenation research (a review).
Satpathy, R; Konkimalla, V B; Ratha, J
2015-01-01
Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.
ERIC Educational Resources Information Center
Brown, James A. L.
2016-01-01
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion,…
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update
Huynh, Tien; Rigoutsos, Isidore
2004-01-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification—directly from sequence—of structural deviations from α-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:15215340
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.
Huynh, Tien; Rigoutsos, Isidore
2004-07-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification--directly from sequence--of structural deviations from alpha-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
A Decade of Genetic and Metabolomic Contributions to Type 2 Diabetes Risk Prediction
Merino, Jordi; Leong, Aaron; Meigs, James B.
2018-01-01
Purpose of Review The purpose of this review was to summarize and reflect on advances over the past decade in human genetic and metabolomic discovery with particular focus on their contributions to type 2 diabetes (T2D) risk prediction. Recent Findings In the past 10 years, a combination of advances in genotyping efficiency, metabolomic profiling, bio-informatics approaches, and international collaboration have moved T2D genetics and metabolomics from a state of frustration to an abundance of new knowledge. Summary Efforts to control and prevent T2D have failed to stop this global epidemic. New approaches are needed, and although neither genetic nor metabolomic profiling yet have a clear clinical role, the rapid pace of accumulating knowledge offers the possibility for “multi-omic” prediction to improve health. PMID:29103096
InCoB2012 Conference: from biological data to knowledge to technological breakthroughs
2012-01-01
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China. PMID:23281929
Computational Studies of Snake Venom Toxins
Ojeda, Paola G.; Caballero, Julio; Kaas, Quentin; González, Wendy
2017-01-01
Most snake venom toxins are proteins, and participate to envenomation through a diverse array of bioactivities, such as bleeding, inflammation, and pain, cytotoxic, cardiotoxic or neurotoxic effects. The venom of a single snake species contains hundreds of toxins, and the venoms of the 725 species of venomous snakes represent a large pool of potentially bioactive proteins. Despite considerable discovery efforts, most of the snake venom toxins are still uncharacterized. Modern bioinformatics tools have been recently developed to mine snake venoms, helping focus experimental research on the most potentially interesting toxins. Some computational techniques predict toxin molecular targets, and the binding mode to these targets. This review gives an overview of current knowledge on the ~2200 sequences, and more than 400 three-dimensional structures of snake toxins deposited in public repositories, as well as of molecular modeling studies of the interaction between these toxins and their molecular targets. We also describe how modern bioinformatics have been used to study the snake venom protein phospholipase A2, the small basic myotoxin Crotamine, and the three-finger peptide Mambalgin. PMID:29271884
Reanalysis of RNA-Sequencing Data Reveals Several Additional Fusion Genes with Multiple Isoforms
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts. PMID:23119097
Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
Kent, Jack W
2016-02-03
New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation and penalties for multiple testing. The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge. Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data. The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
Bioinformatics goes back to the future.
Miller, Crispin J; Attwood, Teresa K
2003-02-01
The need to turn raw data into knowledge has led the bioinformatics field to focus increasingly on the manipulation of information. By drawing parallels with both cryptography and artificial intelligence, we can develop an understanding of the changes that are occurring in bioinformatics, and how these changes are likely to influence the bioinformatics job market.
Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools
Diaz Acosta, B.
2011-01-01
The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.
Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
2017-01-01
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769
Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
2017-01-01
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.
Perspective: Role of structure prediction in materials discovery and design
NASA Astrophysics Data System (ADS)
Needs, Richard J.; Pickard, Chris J.
2016-05-01
Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.
DiscoverySpace: an interactive data analysis application
Robertson, Neil; Oveisi-Fordorei, Mehrdad; Zuyderduyn, Scott D; Varhol, Richard J; Fjell, Christopher; Marra, Marco; Jones, Steven; Siddiqui, Asim
2007-01-01
DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. PMID:17210078
Planning bioinformatics workflows using an expert system.
Chen, Xiaoling; Chang, Jeffrey T
2017-04-15
Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Planning bioinformatics workflows using an expert system
Chen, Xiaoling; Chang, Jeffrey T.
2017-01-01
Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928
BioStar: an online question & answer resource for the bioinformatics community
USDA-ARS?s Scientific Manuscript database
Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate and local colleagues may be overburdened or unfamil...
Severi, Leda; Losi, Lorena; Fonda, Sergio; Taddia, Laura; Gozzi, Gaia; Marverti, Gaetano; Magni, Fulvio; Chinello, Clizia; Stella, Martina; Sheouli, Jalid; Braicu, Elena I; Genovese, Filippo; Lauriola, Angela; Marraccini, Chiara; Gualandi, Alessandra; D'Arca, Domenico; Ferrari, Stefania; Costi, Maria P
2018-01-01
Proteomics and bioinformatics are a useful combined technology for the characterization of protein expression level and modulation associated with the response to a drug and with its mechanism of action. The folate pathway represents an important target in the anticancer drugs therapy. In the present study, a discovery proteomics approach was applied to tissue samples collected from ovarian cancer patients who relapsed after the first-line carboplatin-based chemotherapy and were treated with pemetrexed (PMX), a known folate pathway targeting drug. The aim of the work is to identify the proteomic profile that can be associated to the response to the PMX treatment in pre-treatement tissue. Statistical metrics of the experimental Mass Spectrometry (MS) data were combined with a knowledge-based approach that included bioinformatics and a literature review through ProteinQuest™ tool, to design a protein set of reference (PSR). The PSR provides feedback for the consistency of MS proteomic data because it includes known validated proteins. A panel of 24 proteins with levels that were significantly different in pre-treatment samples of patients who responded to the therapy vs. the non-responder ones, was identified. The differences of the identified proteins were explained for the patients with different outcomes and the known PMX targets were further validated. The protein panel herein identified is ready for further validation in retrospective clinical trials using a targeted proteomic approach. This study may have a general relevant impact on biomarker application for cancer patients therapy selection.
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS.
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%.
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%. PMID:26504488
CellLineNavigator: a workbench for cancer cell line analysis
Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas
2013-01-01
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
Computer Programming and Biomolecular Structure Studies: A Step beyond Internet Bioinformatics
ERIC Educational Resources Information Center
Likic, Vladimir A.
2006-01-01
This article describes the experience of teaching structural bioinformatics to third year undergraduate students in a subject titled "Biomolecular Structure and Bioinformatics." Students were introduced to computer programming and used this knowledge in a practical application as an alternative to the well established Internet bioinformatics…
Evolving from bioinformatics in-the-small to bioinformatics in-the-large.
Parker, D Stott; Gorlick, Michael M; Lee, Christopher J
2003-01-01
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.
Bioinformatics core competencies for undergraduate life sciences education.
Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A
2018-01-01
Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.
Bioinformatics core competencies for undergraduate life sciences education
Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.
2018-01-01
Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542
Gathering and Exploring Scientific Knowledge in Pharmacovigilance
Lopes, Pedro; Nunes, Tiago; Campos, David; Furlong, Laura Ines; Bauer-Mehren, Anna; Sanz, Ferran; Carrascosa, Maria Carmen; Mestres, Jordi; Kors, Jan; Singh, Bharat; van Mulligen, Erik; Van der Lei, Johan; Diallo, Gayo; Avillach, Paul; Ahlberg, Ernst; Boyer, Scott; Diaz, Carlos; Oliveira, José Luís
2013-01-01
Pharmacovigilance plays a key role in the healthcare domain through the assessment, monitoring and discovery of interactions amongst drugs and their effects in the human organism. However, technological advances in this field have been slowing down over the last decade due to miscellaneous legal, ethical and methodological constraints. Pharmaceutical companies started to realize that collaborative and integrative approaches boost current drug research and development processes. Hence, new strategies are required to connect researchers, datasets, biomedical knowledge and analysis algorithms, allowing them to fully exploit the true value behind state-of-the-art pharmacovigilance efforts. This manuscript introduces a new platform directed towards pharmacovigilance knowledge providers. This system, based on a service-oriented architecture, adopts a plugin-based approach to solve fundamental pharmacovigilance software challenges. With the wealth of collected clinical and pharmaceutical data, it is now possible to connect knowledge providers’ analysis and exploration algorithms with real data. As a result, new strategies allow a faster identification of high-risk interactions between marketed drugs and adverse events, and enable the automated uncovering of scientific evidence behind them. With this architecture, the pharmacovigilance field has a new platform to coordinate large-scale drug evaluation efforts in a unique ecosystem, publicly available at http://bioinformatics.ua.pt/euadr/. PMID:24349421
Nawrocki, Eric P.; Burge, Sarah W.
2013-01-01
The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768
Five critical elements to ensure the precision medicine.
Chen, Chengshui; He, Mingyan; Zhu, Yichun; Shi, Lin; Wang, Xiangdong
2015-06-01
The precision medicine as a new emerging area and therapeutic strategy has occurred and was practiced in the individual and brought unexpected successes, and gained high attentions from professional and social aspects as a new path to improve the treatment and prognosis of patients. There will be a number of new components to appear or be discovered, of which clinical bioinformatics integrates clinical phenotypes and informatics with bioinformatics, computational science, mathematics, and systems biology. In addition to those tools, precision medicine calls more accurate and repeatable methodologies for the identification and validation of gene discovery. Precision medicine will bring more new therapeutic strategies, drug discovery and development, and gene-oriented treatment. There is an urgent need to identify and validate disease-specific, mechanism-based, or epigenetics-dependent biomarkers to monitor precision medicine, and develop "precision" regulations to guard the application of precision medicine.
Model-driven discovery of underground metabolic functions in Escherichia coli.
Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M
2015-01-20
Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.
Mining semantic networks of bioinformatics e-resources from the literature
2011-01-01
Background There have been a number of recent efforts (e.g. BioCatalogue, BioMoby) to systematically catalogue bioinformatics tools, services and datasets. These efforts rely on manual curation, making it difficult to cope with the huge influx of various electronic resources that have been provided by the bioinformatics community. We present a text mining approach that utilises the literature to automatically extract descriptions and semantically profile bioinformatics resources to make them available for resource discovery and exploration through semantic networks that contain related resources. Results The method identifies the mentions of resources in the literature and assigns a set of co-occurring terminological entities (descriptors) to represent them. We have processed 2,691 full-text bioinformatics articles and extracted profiles of 12,452 resources containing associated descriptors with binary and tf*idf weights. Since such representations are typically sparse (on average 13.77 features per resource), we used lexical kernel metrics to identify semantically related resources via descriptor smoothing. Resources are then clustered or linked into semantic networks, providing the users (bioinformaticians, curators and service/tool crawlers) with a possibility to explore algorithms, tools, services and datasets based on their relatedness. Manual exploration of links between a set of 18 well-known bioinformatics resources suggests that the method was able to identify and group semantically related entities. Conclusions The results have shown that the method can reconstruct interesting functional links between resources (e.g. linking data types and algorithms), in particular when tf*idf-like weights are used for profiling. This demonstrates the potential of combining literature mining and simple lexical kernel methods to model relatedness between resource descriptors in particular when there are few features, thus potentially improving the resource description, discovery and exploration process. The resource profiles are available at http://gnode1.mib.man.ac.uk/bioinf/semnets.html PMID:21388573
2011-01-01
The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design. PMID:22372736
Metagenomics of Thermophiles with a Focus on Discovery of Novel Thermozymes
DeCastro, María-Eugenia; Rodríguez-Belmonte, Esther; González-Siso, María-Isabel
2016-01-01
Microbial populations living in environments with temperatures above 50°C (thermophiles) have been widely studied, increasing our knowledge in the composition and function of these ecological communities. Since these populations express a broad number of heat-resistant enzymes (thermozymes), they also represent an important source for novel biocatalysts that can be potentially used in industrial processes. The integrated study of the whole-community DNA from an environment, known as metagenomics, coupled with the development of next generation sequencing (NGS) technologies, has allowed the generation of large amounts of data from thermophiles. In this review, we summarize the main approaches commonly utilized for assessing the taxonomic and functional diversity of thermophiles through metagenomics, including several bioinformatics tools and some metagenome-derived methods to isolate their thermozymes. PMID:27729905
Arthropods as a source of new RNA viruses.
Bichaud, L; de Lamballerie, X; Alkan, C; Izri, A; Gould, E A; Charrel, R N
2014-12-01
The discovery and development of methods for isolation, characterisation and taxonomy of viruses represents an important milestone in the study, treatment and control of virus diseases during the 20th century. Indeed, by the late-1950s, it was becoming common belief that most human and veterinary pathogenic viruses had been discovered. However, at that time, knowledge of the impact of improved commercial transportation, urbanisation and deforestation, on disease emergence, was in its infancy. From the late 1960s onwards viruses, such as hepatitis virus (A, B and C) hantavirus, HIV, Marburg virus, Ebola virus and many others began to emerge and it became apparent that the world was changing, at least in terms of virus epidemiology, largely due to the influence of anthropological activities. Subsequently, with the improvement of molecular biotechnologies, for amplification of viral RNA, genome sequencing and proteomic analysis the arsenal of available tools for virus discovery and genetic characterization opened up new and exciting possibilities for virological discovery. Many recently identified but "unclassified" viruses are now being allocated to existing genera or families based on whole genome sequencing, bioinformatic and phylogenetic analysis. New species, genera and families are also being created following the guidelines of the International Committee for the Taxonomy of Viruses. Many of these newly discovered viruses are vectored by arthropods (arboviruses) and possess an RNA genome. This brief review will focus largely on the discovery of new arthropod-borne viruses. Copyright © 2014 Elsevier Ltd. All rights reserved.
Brown, James A L
2016-05-06
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
No-boundary thinking in bioinformatics research
2013-01-01
Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339
PoPLAR: Portal for Petascale Lifescience Applications and Research
2013-01-01
Background We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. Methods The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. Results This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. Conclusions This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. PMID:23902523
Bioinformatics and Medical Informatics: Collaborations on the Road to Genomic Medicine?
Maojo, Victor; Kulikowski, Casimir A.
2003-01-01
In this report, the authors compare and contrast medical informatics (MI) and bioinformatics (BI) and provide a viewpoint on their complementarities and potential for collaboration in various subfields. The authors compare MI and BI along several dimensions, including: (1) historical development of the disciplines, (2) their scientific foundations, (3) data quality and analysis, (4) integration of knowledge and databases, (5) informatics tools to support practice, (6) informatics methods to support research (signal processing, imaging and vision, and computational modeling, (7) professional and patient continuing education, and (8) education and training. It is pointed out that, while the two disciplines differ in their histories, scientific foundations, and methodologic approaches to research in various areas, they nevertheless share methods and tools, which provides a basis for exchange of experience in their different applications. MI expertise in developing health care applications and the strength of BI in biological “discovery science” complement each other well. The new field of biomedical informatics (BMI) holds great promise for developing informatics methods that will be crucial in the development of genomic medicine. The future of BMI will be influenced strongly by whether significant advances in clinical practice and biomedical research come about from separate efforts in MI and BI, or from emerging, hybrid informatics subdisciplines at their interface. PMID:12925552
Rising Strengths Hong Kong SAR in Bioinformatics.
Chakraborty, Chiranjib; George Priya Doss, C; Zhu, Hailong; Agoramoorthy, Govindasamy
2017-06-01
Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.
Dalpé, Gratien; Joly, Yann
2014-09-01
Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.
Tools and data services registry: a community effort to document bioinformatics resources
Ison, Jon; Rapacki, Kristoffer; Ménager, Hervé; Kalaš, Matúš; Rydza, Emil; Chmura, Piotr; Anthon, Christian; Beard, Niall; Berka, Karel; Bolser, Dan; Booth, Tim; Bretaudeau, Anthony; Brezovsky, Jan; Casadio, Rita; Cesareni, Gianni; Coppens, Frederik; Cornell, Michael; Cuccuru, Gianmauro; Davidsen, Kristian; Vedova, Gianluca Della; Dogan, Tunca; Doppelt-Azeroual, Olivia; Emery, Laura; Gasteiger, Elisabeth; Gatter, Thomas; Goldberg, Tatyana; Grosjean, Marie; Grüning, Björn; Helmer-Citterich, Manuela; Ienasescu, Hans; Ioannidis, Vassilios; Jespersen, Martin Closter; Jimenez, Rafael; Juty, Nick; Juvan, Peter; Koch, Maximilian; Laibe, Camille; Li, Jing-Woei; Licata, Luana; Mareuil, Fabien; Mičetić, Ivan; Friborg, Rune Møllegaard; Moretti, Sebastien; Morris, Chris; Möller, Steffen; Nenadic, Aleksandra; Peterson, Hedi; Profiti, Giuseppe; Rice, Peter; Romano, Paolo; Roncaglia, Paola; Saidi, Rabie; Schafferhans, Andrea; Schwämmle, Veit; Smith, Callum; Sperotto, Maria Maddalena; Stockinger, Heinz; Vařeková, Radka Svobodová; Tosatto, Silvio C.E.; de la Torre, Victor; Uva, Paolo; Via, Allegra; Yachdav, Guy; Zambelli, Federico; Vriend, Gert; Rost, Burkhard; Parkinson, Helen; Løngreen, Peter; Brunak, Søren
2016-01-01
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools. PMID:26538599
Taking Open Innovation to the Molecular Level - Strengths and Limitations.
Zdrazil, Barbara; Blomberg, Niklas; Ecker, Gerhard F
2012-08-01
The ever-growing availability of large-scale open data and its maturation is having a significant impact on industrial drug-discovery, as well as on academic and non-profit research. As industry is changing to an 'open innovation' business concept, precompetitive initiatives and strong public-private partnerships including academic research cooperation partners are gaining more and more importance. Now, the bioinformatics and cheminformatics communities are seeking for web tools which allow the integration of this large volume of life science datasets available in the public domain. Such a data exploitation tool would ideally be able to answer complex biological questions by formulating only one search query. In this short review/perspective, we outline the use of semantic web approaches for data and knowledge integration. Further, we discuss strengths and current limitations of public available data retrieval tools and integrated platforms.
Kim, Jihye; Vasu, Vihas T; Mishra, Rangnath; Singleton, Katherine R; Yoo, Minjae; Leach, Sonia M; Farias-Hesson, Eveline; Mason, Robert J; Kang, Jaewoo; Ramamoorthy, Preveen; Kern, Jeffrey A; Heasley, Lynn E; Finigan, James H; Tan, Aik Choon
2014-09-01
Non-small-cell lung cancer (NSCLC) is the leading cause of cancer death in the United States. Targeted tyrosine kinase inhibitors (TKIs) directed against the epidermal growth factor receptor (EGFR) have been widely and successfully used in treating NSCLC patients with activating EGFR mutations. Unfortunately, the duration of response is short-lived, and all patients eventually relapse by acquiring resistance mechanisms. We performed an integrative systems biology approach to determine essential kinases that drive EGFR-TKI resistance in cancer cell lines. We used a series of bioinformatics methods to analyze and integrate the functional genetics screen and RNA-seq data to identify a set of kinases that are critical in survival and proliferation in these TKI-resistant lines. By connecting the essential kinases to compounds using a novel kinase connectivity map (K-Map), we identified and validated bosutinib as an effective compound that could inhibit proliferation and induce apoptosis in TKI-resistant lines. A rational combination of bosutinib and gefitinib showed additive and synergistic effects in cancer cell lines resistant to EGFR TKI alone. We have demonstrated a bioinformatics-driven discovery roadmap for drug repurposing and development in overcoming resistance in EGFR-mutant NSCLC, which could be generalized to other cancer types in the era of personalized medicine. K-Map can be accessible at: http://tanlab.ucdenver.edu/kMap. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Castaneda, Christian; Nalley, Kip; Mannion, Ciaran; Bhattacharyya, Pritish; Blake, Patrick; Pecora, Andrew; Goy, Andre; Suh, K Stephen
2015-01-01
As research laboratories and clinics collaborate to achieve precision medicine, both communities are required to understand mandated electronic health/medical record (EHR/EMR) initiatives that will be fully implemented in all clinics in the United States by 2015. Stakeholders will need to evaluate current record keeping practices and optimize and standardize methodologies to capture nearly all information in digital format. Collaborative efforts from academic and industry sectors are crucial to achieving higher efficacy in patient care while minimizing costs. Currently existing digitized data and information are present in multiple formats and are largely unstructured. In the absence of a universally accepted management system, departments and institutions continue to generate silos of information. As a result, invaluable and newly discovered knowledge is difficult to access. To accelerate biomedical research and reduce healthcare costs, clinical and bioinformatics systems must employ common data elements to create structured annotation forms enabling laboratories and clinics to capture sharable data in real time. Conversion of these datasets to knowable information should be a routine institutionalized process. New scientific knowledge and clinical discoveries can be shared via integrated knowledge environments defined by flexible data models and extensive use of standards, ontologies, vocabularies, and thesauri. In the clinical setting, aggregated knowledge must be displayed in user-friendly formats so that physicians, non-technical laboratory personnel, nurses, data/research coordinators, and end-users can enter data, access information, and understand the output. The effort to connect astronomical numbers of data points, including '-omics'-based molecular data, individual genome sequences, experimental data, patient clinical phenotypes, and follow-up data is a monumental task. Roadblocks to this vision of integration and interoperability include ethical, legal, and logistical concerns. Ensuring data security and protection of patient rights while simultaneously facilitating standardization is paramount to maintaining public support. The capabilities of supercomputing need to be applied strategically. A standardized, methodological implementation must be applied to developed artificial intelligence systems with the ability to integrate data and information into clinically relevant knowledge. Ultimately, the integration of bioinformatics and clinical data in a clinical decision support system promises precision medicine and cost effective and personalized patient care.
Bioinformatics Goes to School—New Avenues for Teaching Contemporary Biology
Wood, Louisa; Gebhardt, Philipp
2013-01-01
Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students. In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers. PMID:23785266
Controlling new knowledge: Genomic science, governance and the politics of bioinformatics.
Salter, Brian; Salter, Charlotte
2017-04-01
The rise of bioinformatics is a direct response to the political difficulties faced by genomics in its quest to be a new biomedical innovation, and the value of bioinformatics lies in its role as the bridge between the promise of genomics and its realization in the form of health benefits. Western scientific elites are able to use their close relationship with the state to control and facilitate the emergence of new domains compatible with the existing distribution of epistemic power - all within the embrace of public trust. The incorporation of bioinformatics as the saviour of genomics had to be integrated with the operation of two key aspects of governance in this field: the definition and ownership of the new knowledge. This was achieved mainly by the development of common standards and by the promotion of the values of communality, open access and the public ownership of data to legitimize and maintain the governance power of publicly funded genomic science. Opposition from industry advocating the private ownership of knowledge has been largely neutered through the institutions supporting the science-state concordat. However, in order for translation into health benefits to occur and public trust to be assured, genomic and clinical data have to be integrated and knowledge ownership agreed upon across the separate and distinct governance territories of scientist, clinical medicine and society. Tensions abound as science seeks ways of maintaining its control of knowledge production through the negotiation of new forms of governance with the institutions and values of clinicians and patients.
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Innovative Methodology in the Discovery of Novel Drug Targets in the Free-Living Amoebae
Baig, Abdul Mannan
2018-04-25
Despite advances in drug discovery and modifications in the chemotherapeutic regimens, human infections caused by free-living amoebae (FLA) have high mortality rates (~95%). The FLA that cause fatal human cerebral infections include Naegleria fowleri, Balamuthia mandrillaris and Acanthamoeba spp. Novel drug-target discovery remains the only viable option to tackle these central nervous system (CNS) infection in order to lower the mortality rates caused by the FLA. Of these FLA, N. fowleri causes primary amoebic meningoencephalitis (PAM), while the A. castellanii and B. Mandrillaris are known to cause granulomatous amoebic encephalitis (GAE). The infections caused by the FLA have been treated with drugs like Rifampin, Fluconazole, Amphotericin-B and Miltefosine. Miltefosine is an anti-leishmanial agent and an experimental anti-cancer drug. With only rare incidences of success, these drugs have remained unsuccessful to lower the mortality rates of the cerebral infection caused by FLA. Recently, with the help of bioinformatic computational tools and the discovered genomic data of the FLA, discovery of newer drug targets has become possible. These cellular targets are proteins that are either unique to the FLA or shared between the humans and these unicellular eukaryotes. The latter group of proteins has shown to be targets of some FDA approved drugs prescribed in non-infectious diseases. This review out-lines the bioinformatic methodologies that can be used in the discovery of such novel drug-targets, their chronicle by in-vitro assays done in the past and the translational value of such target discoveries in human diseases caused by FLA. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Discovery of 100K SNP array and its utilization in sugarcane
USDA-ARS?s Scientific Manuscript database
Next generation sequencing (NGS) enable us to identify thousands of single nucleotide polymorphisms (SNPs) marker for genotyping and fingerprinting. However, the process requires very precise bioinformatics analysis and filtering process. High throughput SNP array with predefined genomic location co...
Explorative search of distributed bio-data to answer complex biomedical questions
2014-01-01
Background The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions. Results A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions. Conclusions By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery. PMID:24564278
Valleron, Alain-Jacques
2017-08-15
Automation of laboratory tests, bioinformatic analysis of biological sequences, and professional data management are used routinely in a modern university hospital-based infectious diseases institute. This dates back to at least the 1980s. However, the scientific methods of this 21st century are changing with the increased power and speed of computers, with the "big data" revolution having already happened in genomics and environment, and eventually arriving in medical informatics. The research will be increasingly "data driven," and the powerful machine learning methods whose efficiency is demonstrated in daily life will also revolutionize medical research. A university-based institute of infectious diseases must therefore not only gather excellent computer scientists and statisticians (as in the past, and as in any medical discipline), but also fully integrate the biologists and clinicians with these computer scientists, statisticians, and mathematical modelers having a broad culture in machine learning, knowledge representation, and knowledge discovery. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.
De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan
2015-12-01
The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements
De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan
2015-01-01
Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488
ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.
Urbanowicz, Ryan J; Moore, Jason H
2015-09-01
Algorithmic scalability is a major concern for any machine learning strategy in this age of 'big data'. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExS-TraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExS-TraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExS-TraCS usability was made simpler through the elimination of previously critical run parameters.
Vernick, Kenneth D.
2017-01-01
Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
Systems analysis of arrestin pathway functions.
Maudsley, Stuart; Siddiqui, Sana; Martin, Bronwen
2013-01-01
To fully appreciate the diversity and specificity of complex cellular signaling events, such as arrestin-mediated signaling from G protein-coupled receptor activation, a complex systems-level investigation currently appears to be the best option. A rational combination of transcriptomics, proteomics, and interactomics, all coherently integrated with applied next-generation bioinformatics, is vital for the future understanding of the development, translation, and expression of GPCR-mediated arrestin signaling events in physiological contexts. Through a more nuanced, systems-level appreciation of arrestin-mediated signaling, the creation of arrestin-specific molecular response "signatures" should be made simple and ultimately amenable to drug discovery processes. Arrestin-based signaling paradigms possess important aspects, such as its specific temporal kinetics and ability to strongly affect transcriptional activity, that make it an ideal test bed for next-generation of drug discovery bioinformatic approaches such as multi-parallel dose-response analysis, data texturization, and latent semantic indexing-based natural language data processing and feature extraction. Copyright © 2013 Elsevier Inc. All rights reserved.
Emerging Trends in the Discovery of Natural Product Antibacterials
Bologa, Cristian G.; Ursu, Oleg; Oprea, Tudor; Melançon, Charles E.; Tegos, George P.
2013-01-01
This article highlights current trends and advances in exploiting natural sources for the deployment of novel and potent anti-infective countermeasures. The key challenge is to therapeutically target microbial pathogens exhibiting a variety of puzzling and evolutionary complex resistance mechanisms. Special emphasis is given to the strengths, weaknesses, and opportunities in the natural product antimicrobial drug discovery arena, and to emerging applications driven by advances in bioinformatics, chemical biology, and synthetic biology in concert with exploiting the microbial phenotype. These orchestrated efforts have identified a critical mass of lead natural antimicrobials chemical scaffolds and discovery technologies with high probability of successful implementation against emerging microbial pathogens. PMID:23890825
Combining medical informatics and bioinformatics toward tools for personalized medicine.
Sarachan, B D; Simmons, M K; Subramanian, P; Temkin, J M
2003-01-01
Key bioinformatics and medical informatics research areas need to be identified to advance knowledge and understanding of disease risk factors and molecular disease pathology in the 21 st century toward new diagnoses, prognoses, and treatments. Three high-impact informatics areas are identified: predictive medicine (to identify significant correlations within clinical data using statistical and artificial intelligence methods), along with pathway informatics and cellular simulations (that combine biological knowledge with advanced informatics to elucidate molecular disease pathology). Initial predictive models have been developed for a pilot study in Huntington's disease. An initial bioinformatics platform has been developed for the reconstruction and analysis of pathways, and work has begun on pathway simulation. A bioinformatics research program has been established at GE Global Research Center as an important technology toward next generation medical diagnostics. We anticipate that 21 st century medical research will be a combination of informatics tools with traditional biology wet lab research, and that this will translate to increased use of informatics techniques in the clinic.
Mulder, Nicola; Schwartz, Russell; Brazas, Michelle D; Brooksbank, Cath; Gaeta, Bruno; Morgan, Sarah L; Pauley, Mark A; Rosenwald, Anne; Rustici, Gabriella; Sierk, Michael; Warnow, Tandy; Welch, Lonnie
2018-02-01
Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.
Brooksbank, Cath; Morgan, Sarah L.; Rosenwald, Anne; Warnow, Tandy; Welch, Lonnie
2018-01-01
Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans. PMID:29390004
González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E
2011-01-01
After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.
Pollett, S; Leguia, M; Nelson, M I; Maljkovic Berry, I; Rutherford, G; Bausch, D G; Kasper, M; Jarman, R; Melendrez, M
2016-01-01
There is an increasing role for bioinformatic and phylogenetic analysis in tropical medicine research. However, scientists working in low- and middle-income regions may lack access to training opportunities in these methods. To help address this gap, a 5-day intensive bioinformatics workshop was offered in Lima, Peru. The syllabus is presented here for others who want to develop similar programs. To assess knowledge gained, a 20-point knowledge questionnaire was administered to participants (21 participants) before and after the workshop, covering topics on sequence quality control, alignment/formatting, database retrieval, models of evolution, sequence statistics, tree building, and results interpretation. Evolution/tree-building methods represented the lowest scoring domain at baseline and after the workshop. There was a considerable median gain in total knowledge scores (increase of 30%, p<0.001) with gains as high as 55%. A 5-day workshop model was effective in improving the pathogen-applied bioinformatics knowledge of scientists working in a middle-income country setting. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Controlling new knowledge: Genomic science, governance and the politics of bioinformatics
Salter, Brian; Salter, Charlotte
2017-01-01
The rise of bioinformatics is a direct response to the political difficulties faced by genomics in its quest to be a new biomedical innovation, and the value of bioinformatics lies in its role as the bridge between the promise of genomics and its realization in the form of health benefits. Western scientific elites are able to use their close relationship with the state to control and facilitate the emergence of new domains compatible with the existing distribution of epistemic power – all within the embrace of public trust. The incorporation of bioinformatics as the saviour of genomics had to be integrated with the operation of two key aspects of governance in this field: the definition and ownership of the new knowledge. This was achieved mainly by the development of common standards and by the promotion of the values of communality, open access and the public ownership of data to legitimize and maintain the governance power of publicly funded genomic science. Opposition from industry advocating the private ownership of knowledge has been largely neutered through the institutions supporting the science-state concordat. However, in order for translation into health benefits to occur and public trust to be assured, genomic and clinical data have to be integrated and knowledge ownership agreed upon across the separate and distinct governance territories of scientist, clinical medicine and society. Tensions abound as science seeks ways of maintaining its control of knowledge production through the negotiation of new forms of governance with the institutions and values of clinicians and patients. PMID:28056721
Raja, Kalpana; Patrick, Matthew; Gao, Yilin; Madu, Desmond; Yang, Yuyang
2017-01-01
In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information. PMID:28331849
Antibiotics and specialized metabolites from the human microbiota.
Mousa, Walaa K; Athar, Bilal; Merwin, Nishanth J; Magarvey, Nathan A
2017-11-15
Covering: 2000 to 2017Decades of research on human microbiota have revealed much of their taxonomic diversity and established their direct link to health and disease. However, the breadth of bioactive natural products secreted by our microbial partners remains unknown. Of particular interest are antibiotics produced by our microbiota to ward off invasive pathogens. Members of the human microbiota exclusively produce evolved small molecules with selective antimicrobial activity against human pathogens. Herein, we expand upon the current knowledge concerning antibiotics derived from human microbiota and their distribution across body sites. We analyze, using our in-house chem-bioinformatic tools and natural products database, the encoded antibiotic potential of the human microbiome. This compilation of information may create a foundation for the continued exploration of this intriguing resource of chemical diversity and expose challenges and future perspectives to accelerate the discovery rate of small molecules from the human microbiota.
Privacy Preserving PCA on Distributed Bioinformatics Datasets
ERIC Educational Resources Information Center
Li, Xin
2011-01-01
In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fraga, Carlos G.; Clowers, Brian H.; Moore, Ronald J.
2010-05-15
This report demonstrates the use of bioinformatic and chemometric tools on liquid chromatography mass spectrometry (LC-MS) data for the discovery of ultra-trace forensic signatures for sample matching of various stocks of the nerve-agent precursor known as methylphosphonic dichloride (dichlor). The use of the bioinformatic tool known as XCMS was used to comprehensively search and find candidate LC-MS peaks in a known set of dichlor samples. These candidate peaks were down selected to a group of 34 impurity peaks. Hierarchal cluster analysis and factor analysis demonstrated the potential of these 34 impurities peaks for matching samples based on their stock source.more » Only one pair of dichlor stocks was not differentiated from one another. An acceptable chemometric approach for sample matching was determined to be variance scaling and signal averaging of normalized duplicate impurity profiles prior to classification by k-nearest neighbors. Using this approach, a test set of dichlor samples were all correctly matched to their source stock. The sample preparation and LC-MS method permitted the detection of dichlor impurities presumably in the parts-per-trillion (w/w). The detection of a common impurity in all dichlor stocks that were synthesized over a 14-year period and by different manufacturers was an unexpected discovery. Our described signature-discovery approach should be useful in the development of a forensic capability to help in criminal investigations following chemical attacks.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver
2009-11-20
Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less
Jiang, Wei; Yu, Weichuan
2017-02-15
In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Computational biology and bioinformatics in Nigeria.
Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi
2014-04-01
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.
Computational Biology and Bioinformatics in Nigeria
Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi
2014-01-01
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310
Bioinformatics Meets Virology: The European Virus Bioinformatics Center's Second Annual Meeting.
Ibrahim, Bashar; Arkhipova, Ksenia; Andeweg, Arno C; Posada-Céspedes, Susana; Enault, François; Gruber, Arthur; Koonin, Eugene V; Kupczok, Anne; Lemey, Philippe; McHardy, Alice C; McMahon, Dino P; Pickett, Brett E; Robertson, David L; Scheuermann, Richard H; Zhernakova, Alexandra; Zwart, Mark P; Schönhuth, Alexander; Dutilh, Bas E; Marz, Manja
2018-05-14
The Second Annual Meeting of the European Virus Bioinformatics Center (EVBC), held in Utrecht, Netherlands, focused on computational approaches in virology, with topics including (but not limited to) virus discovery, diagnostics, (meta-)genomics, modeling, epidemiology, molecular structure, evolution, and viral ecology. The goals of the Second Annual Meeting were threefold: (i) to bring together virologists and bioinformaticians from across the academic, industrial, professional, and training sectors to share best practice; (ii) to provide a meaningful and interactive scientific environment to promote discussion and collaboration between students, postdoctoral fellows, and both new and established investigators; (iii) to inspire and suggest new research directions and questions. Approximately 120 researchers from around the world attended the Second Annual Meeting of the EVBC this year, including 15 renowned international speakers. This report presents an overview of new developments and novel research findings that emerged during the meeting.
Detecting circular RNAs: bioinformatic and experimental challenges
Szabo, Linda; Salzman, Julia
2017-01-01
The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic. PMID:27739534
Wren, Jonathan D; Dozmorov, Mikhail G; Burian, Dennis; Kaundal, Rakesh; Perkins, Andy; Perkins, Ed; Kupfer, Doris M; Springer, Gordon K
2013-01-01
The tenth annual conference of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS 2013), "The 10th Anniversary in a Decade of Change: Discovery in a Sea of Data", took place at the Stoney Creek Inn & Conference Center in Columbia, Missouri on April 5-6, 2013. This year's Conference Chairs were Gordon Springer and Chi-Ren Shyu from the University of Missouri and Edward Perkins from the US Army Corps of Engineers Engineering Research and Development Center, who is also the current MCBIOS President (2012-3). There were 151 registrants and a total of 111 abstracts (51 oral presentations and 60 poster session abstracts).
Gerstein, Mark; Greenbaum, Dov; Cheung, Kei; Miller, Perry L
2007-02-01
Computational biology and bioinformatics (CBB), the terms often used interchangeably, represent a rapidly evolving biological discipline. With the clear potential for discovery and innovation, and the need to deal with the deluge of biological data, many academic institutions are committing significant resources to develop CBB research and training programs. Yale formally established an interdepartmental Ph.D. program in CBB in May 2003. This paper describes Yale's program, discussing the scope of the field, the program's goals and curriculum, as well as a number of issues that arose in implementing the program. (Further updated information is available from the program's website, www.cbb.yale.edu.)
Broad issues to consider for library involvement in bioinformatics*
Geer, Renata C.
2006-01-01
Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662
The next generation of training for Arabidopsis researchers: bioinformatics and quantitative biology
USDA-ARS?s Scientific Manuscript database
It has been more than 50 years since Arabidopsis (Arabidopsis thaliana) was first introduced as a model organism to understand basic processes in plant biology. A well-organized scientific community has used this small reference plant species to make numerous fundamental plant biology discoveries (P...
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R; Drabkin, Harold; Kochut, Krzysztof J; Ruan, Zheng; D'Eustachio, Peter; McSkimming, Daniel; Arighi, Cecilia; Chen, Chuming; Natale, Darren A; Smith, Cynthia; Gaudet, Pascale; Newton, Alexandra C; Wu, Cathy; Kannan, Natarajan
2018-04-25
Many bioinformatics resources with unique perspectives on the protein landscape are currently available. However, generating new knowledge from these resources requires interoperable workflows that support cross-resource queries. In this study, we employ federated queries linking information from the Protein Kinase Ontology, iPTMnet, Protein Ontology, neXtProt, and the Mouse Genome Informatics to identify key knowledge gaps in the functional coverage of the human kinome and prioritize understudied kinases, cancer variants and post-translational modifications (PTMs) for functional studies. We identify 32 functional domains enriched in cancer variants and PTMs and generate mechanistic hypotheses on overlapping variant and PTM sites by aggregating information at the residue, protein, pathway and species level from these resources. We experimentally test the hypothesis that S768 phosphorylation in the C-helix of EGFR is inhibitory by showing that oncogenic variants altering S768 phosphorylation increase basal EGFR activity. In contrast, oncogenic variants altering conserved phosphorylation sites in the 'hydrophobic motif' of PKCβII (S660F and S660C) are loss-of-function in that they reduce kinase activity and enhance membrane translocation. Our studies provide a framework for integrative, consistent, and reproducible annotation of the cancer kinomes.
How rare bone diseases have informed our knowledge of complex diseases.
Johnson, Mark L
2016-01-01
Rare bone diseases, generally defined as monogenic traits with either autosomal recessive or dominant patterns of inheritance, have provided a rich database of genes and associated pathways over the past 2-3 decades. The molecular genetic dissection of these bone diseases has yielded some major surprises in terms of the causal genes and/or involved pathways. The discovery of genes/pathways involved in diseases such as osteopetrosis, osteosclerosis, osteogenesis imperfecta and many other rare bone diseases have all accelerated our understanding of complex traits. Importantly these discoveries have provided either direct validation for a specific gene embedded in a group of genes within an interval identified through a complex trait genome-wide association study (GWAS) or based upon the pathway associated with a monogenic trait gene, provided a means to prioritize a large number of genes for functional validation studies. In some instances GWAS studies have yielded candidate genes that fall within linkage intervals associated with monogenic traits and resulted in the identification of causal mutations in those rare diseases. Driving all of this discovery is a complement of technologies such as genome sequencing, bioinformatics and advanced statistical analysis methods that have accelerated genetic dissection and greatly reduced the cost. Thus, rare bone disorders in partnership with GWAS have brought us to the brink of a new era of personalized genomic medicine in which the prevention and management of complex diseases will be driven by the molecular understanding of each individuals contributing genetic risks for disease.
Chiu, Charles Y
2015-01-01
Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease. PMID:23725672
BioShaDock: a community driven bioinformatics shared Docker-based tools registry
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community. PMID:26913191
BioShaDock: a community driven bioinformatics shared Docker-based tools registry.
Moreews, François; Sallou, Olivier; Ménager, Hervé; Le Bras, Yvan; Monjeaud, Cyril; Blanchet, Christophe; Collin, Olivier
2015-01-01
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.
A Bioinformatics Facility for NASA
NASA Technical Reports Server (NTRS)
Schweighofer, Karl; Pohorille, Andrew
2006-01-01
Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.
Learning Genetics through an Authentic Research Simulation in Bioinformatics
ERIC Educational Resources Information Center
Gelbart, Hadas; Yarden, Anat
2006-01-01
Following the rationale that learning is an active process of knowledge construction as well as enculturation into a community of experts, we developed a novel web-based learning environment in bioinformatics for high-school biology majors in Israel. The learning environment enables the learners to actively participate in a guided inquiry process…
A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research
ERIC Educational Resources Information Center
Campbell, Chad E.; Nehm, Ross H.
2013-01-01
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…
Bioinformatics projects supporting life-sciences learning in high schools.
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.
Deep learning in bioinformatics.
Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh
2017-09-01
In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Enhancing knowledge discovery from cancer genomics data with Galaxy
Albuquerque, Marco A.; Grande, Bruno M.; Ritch, Elie J.; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K.; Shah, Sohrab P.; Boutros, Paul C.
2017-01-01
Abstract The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. PMID:28327945
Enhancing knowledge discovery from cancer genomics data with Galaxy.
Albuquerque, Marco A; Grande, Bruno M; Ritch, Elie J; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K; Shah, Sohrab P; Boutros, Paul C; Morin, Ryan D
2017-05-01
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. © The Author 2017. Published by Oxford University Press.
Determining conserved metabolic biomarkers from a million database queries.
Kurczy, Michael E; Ivanisevic, Julijana; Johnson, Caroline H; Uritboonthai, Winnie; Hoang, Linh; Fang, Mingliang; Hicks, Matthew; Aldebot, Anthony; Rinehart, Duane; Mellander, Lisa J; Tautenhahn, Ralf; Patti, Gary J; Spilker, Mary E; Benton, H Paul; Siuzdak, Gary
2015-12-01
Metabolite databases provide a unique window into metabolome research allowing the most commonly searched biomarkers to be catalogued. Omic scale metabolite profiling, or metabolomics, is finding increased utility in biomarker discovery largely driven by improvements in analytical technologies and the concurrent developments in bioinformatics. However, the successful translation of biomarkers into clinical or biologically relevant indicators is limited. With the aim of improving the discovery of translatable metabolite biomarkers, we present search analytics for over one million METLIN metabolite database queries. The most common metabolites found in METLIN were cross-correlated against XCMS Online, the widely used cloud-based data processing and pathway analysis platform. Analysis of the METLIN and XCMS common metabolite data has two primary implications: these metabolites, might indicate a conserved metabolic response to stressors and, this data may be used to gauge the relative uniqueness of potential biomarkers. METLIN can be accessed by logging on to: https://metlin.scripps.edu siuzdak@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.
Haque, Ashraful; Engel, Jessica; Teichmann, Sarah A; Lönnberg, Tapio
2017-08-18
RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology-the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.
Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens
Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo
2018-01-01
Abstract Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. PMID:29106651
Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard
2012-01-01
A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985
Honts, Jerry E.
2003-01-01
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489
Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.
Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron
2009-01-01
The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242
Bioinformatics Approaches for Fetal DNA Fraction Estimation in Noninvasive Prenatal Testing
Peng, Xianlu Laura; Jiang, Peiyong
2017-01-01
The discovery of cell-free fetal DNA molecules in plasma of pregnant women has created a paradigm shift in noninvasive prenatal testing (NIPT). Circulating cell-free DNA in maternal plasma has been increasingly recognized as an important proxy to detect fetal abnormalities in a noninvasive manner. A variety of approaches for NIPT using next-generation sequencing have been developed, which have been rapidly transforming clinical practices nowadays. In such approaches, the fetal DNA fraction is a pivotal parameter governing the overall performance and guaranteeing the proper clinical interpretation of testing results. In this review, we describe the current bioinformatics approaches developed for estimating the fetal DNA fraction and discuss their pros and cons. PMID:28230760
Bioinformatics Approaches for Fetal DNA Fraction Estimation in Noninvasive Prenatal Testing.
Peng, Xianlu Laura; Jiang, Peiyong
2017-02-20
The discovery of cell-free fetal DNA molecules in plasma of pregnant women has created a paradigm shift in noninvasive prenatal testing (NIPT). Circulating cell-free DNA in maternal plasma has been increasingly recognized as an important proxy to detect fetal abnormalities in a noninvasive manner. A variety of approaches for NIPT using next-generation sequencing have been developed, which have been rapidly transforming clinical practices nowadays. In such approaches, the fetal DNA fraction is a pivotal parameter governing the overall performance and guaranteeing the proper clinical interpretation of testing results. In this review, we describe the current bioinformatics approaches developed for estimating the fetal DNA fraction and discuss their pros and cons.
Expanding the horizons of microRNA bioinformatics.
Huntley, Rachael P; Kramarz, Barbara; Sawford, Tony; Umrao, Zara; Kalea, Anastasia Z; Acquaah, Vanessa; Martin, Maria-Jesus; Mayr, Manuel; Lovering, Ruth C
2018-06-05
MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyse microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics datasets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 3,900 Gene Ontology annotations associated with almost 500 miRNAs from human, mouse and rat and over 2,200 experimentally validated miRNA:target interactions. We illustrate how this resource can be used to create miRNA-focused interaction networks with a biological context using the known biological role of miRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent datasets for reproducible functional analysis of microRNAs across all biological research areas. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Wightman, Bruce; Hark, Amy T
2012-01-01
The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative. Copyright © 2012 Wiley Periodicals, Inc.
R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms
Kramer, Frank; Bayerlová, Michaela; Beißbarth, Tim
2014-01-01
Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. PMID:24833336
Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.
Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru
2016-09-29
Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.
[Integration of clinical and biological data in clinical practice using bioinformatics].
Coltell, Oscar; Arregui, María; Fabregat, Antonio; Portolés, Olga
2008-05-01
The aim of our work is to describe essential aspects of Medical Informatics, Bioinformatics and Biomedical Informatics, that are used in biomedical research and clinical practice. These disciplines have emerged from the need to find new scientific and technical approaches to manage, store, analyze and report data generated in clinical practice and molecular biology and other medical specialties. It can be also useful to integrate research information generated in different areas of health care. Moreover, these disciplines are interdisciplinary and integrative, two key features not shared by other areas of medical knowledge. Finally, when Bioinformatics and Biomedical Informatics approach to medical investigation and practice are applied, a new discipline, called Clinical Bioinformatics, emerges. The latter requires a specific training program to create a new professional profile. We have not been able to find a specific training program in Clinical Bioinformatics in Spain.
Oluwagbemi, Olugbenga O; Adewumi, Adewole; Esuruoso, Abimbola
2012-01-01
Computational biology and bioinformatics are gradually gaining grounds in Africa and other developing nations of the world. However, in these countries, some of the challenges of computational biology and bioinformatics education are inadequate infrastructures, and lack of readily-available complementary and motivational tools to support learning as well as research. This has lowered the morale of many promising undergraduates, postgraduates and researchers from aspiring to undertake future study in these fields. In this paper, we developed and described MACBenAbim (Multi-platform Mobile Application for Computational Biology and Bioinformatics), a flexible user-friendly tool to search for, define and describe the meanings of keyterms in computational biology and bioinformatics, thus expanding the frontiers of knowledge of the users. This tool also has the capability of achieving visualization of results on a mobile multi-platform context. MACBenAbim is available from the authors for non-commercial purposes.
Bioinformatics Projects Supporting Life-Sciences Learning in High Schools
Marques, Isabel; Almeida, Paulo; Alves, Renato; Dias, Maria João; Godinho, Ana; Pereira-Leal, José B.
2014-01-01
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools. PMID:24465192
BioSmalltalk: a pure object system and library for bioinformatics.
Morales, Hernán F; Giovambattista, Guillermo
2013-09-15
We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important for discovering biological knowledge, as is demonstrated by the emergence of open-source programming toolkits for bioinformatics in the past years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving the possibility of scaling to whole-system biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk hernan.morales@gmail.com Supplementary data are available at Bioinformatics online.
Hidden in the Middle: Culture, Value and Reward in Bioinformatics.
Lewis, Jamie; Bartlett, Andrew; Atkinson, Paul
2016-01-01
Bioinformatics - the so-called shotgun marriage between biology and computer science - is an interdiscipline. Despite interdisciplinarity being seen as a virtue, for having the capacity to solve complex problems and foster innovation, it has the potential to place projects and people in anomalous categories. For example, valorised 'outputs' in academia are often defined and rewarded by discipline. Bioinformatics, as an interdisciplinary bricolage, incorporates experts from various disciplinary cultures with their own distinct ways of working. Perceived problems of interdisciplinarity include difficulties of making explicit knowledge that is practical, theoretical, or cognitive. But successful interdisciplinary research also depends on an understanding of disciplinary cultures and value systems, often only tacitly understood by members of the communities in question. In bioinformatics, the 'parent' disciplines have different value systems; for example, what is considered worthwhile research by computer scientists can be thought of as trivial by biologists, and vice versa . This paper concentrates on the problems of reward and recognition described by scientists working in academic bioinformatics in the United Kingdom. We highlight problems that are a consequence of its cross-cultural make-up, recognising that the mismatches in knowledge in this borderland take place not just at the level of the practical, theoretical, or epistemological, but also at the cultural level too. The trend in big, interdisciplinary science is towards multiple authors on a single paper; in bioinformatics this has created hybrid or fractional scientists who find they are being positioned not just in-between established disciplines but also in-between as middle authors or, worse still, left off papers altogether.
Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.
Banwait, Jasjit K; Bastola, Dhundy R
2015-01-01
Despite enormous efforts, cancer remains one of the most lethal diseases in the world. With the advancement of high throughput technologies massive amounts of cancer data can be accessed and analyzed. Bioinformatics provides a platform to assist biologists in developing minimally invasive biomarkers to detect cancer, and in designing effective personalized therapies to treat cancer patients. Still, the early diagnosis, prognosis, and treatment of cancer are an open challenge for the research community. MicroRNAs (miRNAs) are small non-coding RNAs that serve to regulate gene expression. The discovery of deregulated miRNAs in cancer cells and tissues has led many to investigate the use of miRNAs as potential biomarkers for early detection, and as a therapeutic agent to treat cancer. Here we describe advancements in computational approaches to predict miRNAs and their targets, and discuss the role of bioinformatics in studying miRNAs in the context of human cancer. Published by Elsevier B.V.
Bioinformatics challenges for genome-wide association studies.
Moore, Jason H; Asselbergs, Folkert W; Williams, Scott M
2010-02-15
The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype-phenotype relationship that is characterized by significant heterogeneity and gene-gene and gene-environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods.
Cornforth, Michael N; Anur, Pavana; Wang, Nicholas; Robinson, Erin; Ray, F Andrew; Bedford, Joel S; Loucas, Bradford D; Williams, Eli S; Peto, Myron; Spellman, Paul; Kollipara, Rahul; Kittler, Ralf; Gray, Joe W; Bailey, Susan M
2018-05-11
Chromosome rearrangements are large-scale structural variants that are recognized drivers of oncogenic events in cancers of all types. Cytogenetics allows for their rapid, genome-wide detection, but does not provide gene-level resolution. Massively parallel sequencing (MPS) promises DNA sequence-level characterization of the specific breakpoints involved, but is strongly influenced by bioinformatics filters that affect detection efficiency. We sought to characterize the breakpoint junctions of chromosomal translocations and inversions in the clonal derivatives of human cells exposed to ionizing radiation. Here, we describe the first successful use of DNA paired-end analysis to locate and sequence across the breakpoint junctions of a radiation-induced reciprocal translocation. The analyses employed, with varying degrees of success, several well-known bioinformatics algorithms, a task made difficult by the involvement of repetitive DNA sequences. As for underlying mechanisms, the results of Sanger sequencing suggested that the translocation in question was likely formed via microhomology-mediated non-homologous end joining (mmNHEJ). To our knowledge, this represents the first use of MPS to characterize the breakpoint junctions of a radiation-induced chromosomal translocation in human cells. Curiously, these same approaches were unsuccessful when applied to the analysis of inversions previously identified by directional genomic hybridization (dGH). We conclude that molecular cytogenetics continues to provide critical guidance for structural variant discovery, validation and in "tuning" analysis filters to enable robust breakpoint identification at the base pair level.
Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data
2004-12-01
genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS
USDA-ARS?s Scientific Manuscript database
Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus and Vaccinium) are known hosts for more than 70 viruses and new ones are identified continually. In modern berry cultivars, viruses tend to be be asymptomatic in single infections and symptoms only develop after plants accumulate m...
USDA-ARS?s Scientific Manuscript database
Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus and Vaccinium) are known hosts for more than 70 viruses and new ones are identified frequently. In modern berry cultivars, viruses tend to be asymptomatic in single infections and symptoms only develop after plants accumulate multip...
High-throughput strategies for the discovery and engineering of enzymes for biocatalysis.
Jacques, Philippe; Béchet, Max; Bigan, Muriel; Caly, Delphine; Chataigné, Gabrielle; Coutte, François; Flahaut, Christophe; Heuson, Egon; Leclère, Valérie; Lecouturier, Didier; Phalip, Vincent; Ravallec, Rozenn; Dhulster, Pascal; Froidevaux, Rénato
2017-02-01
Innovations in novel enzyme discoveries impact upon a wide range of industries for which biocatalysis and biotransformations represent a great challenge, i.e., food industry, polymers and chemical industry. Key tools and technologies, such as bioinformatics tools to guide mutant library design, molecular biology tools to create mutants library, microfluidics/microplates, parallel miniscale bioreactors and mass spectrometry technologies to create high-throughput screening methods and experimental design tools for screening and optimization, allow to evolve the discovery, development and implementation of enzymes and whole cells in (bio)processes. These technological innovations are also accompanied by the development and implementation of clean and sustainable integrated processes to meet the growing needs of chemical, pharmaceutical, environmental and biorefinery industries. This review gives an overview of the benefits of high-throughput screening approach from the discovery and engineering of biocatalysts to cell culture for optimizing their production in integrated processes and their extraction/purification.
Advances in the genetic dissection of plant cell walls: tools and resources available in Miscanthus
Slavov, Gancho; Allison, Gordon; Bosch, Maurice
2013-01-01
Tropical C4 grasses from the genus Miscanthus are believed to have great potential as biomass crops. However, Miscanthus species are essentially undomesticated, and genetic, molecular and bioinformatics tools are in very early stages of development. Furthermore, similar to other crops targeted as lignocellulosic feedstocks, the efficient utilization of biomass is hampered by our limited knowledge of the structural organization of the plant cell wall and the underlying genetic components that control this organization. The Institute of Biological, Environmental and Rural Sciences (IBERS) has assembled an extensive collection of germplasm for several species of Miscanthus. In addition, an integrated, multidisciplinary research programme at IBERS aims to inform accelerated breeding for biomass productivity and composition, while also generating fundamental knowledge. Here we review recent advances with respect to the genetic characterization of the cell wall in Miscanthus. First, we present a summary of recent and on-going biochemical studies, including prospects and limitations for the development of powerful phenotyping approaches. Second, we review current knowledge about genetic variation for cell wall characteristics of Miscanthus and illustrate how phenotypic data, combined with high-density arrays of single-nucleotide polymorphisms, are being used in genome-wide association studies to generate testable hypotheses and guide biological discovery. Finally, we provide an overview of the current knowledge about the molecular biology of cell wall biosynthesis in Miscanthus and closely related grasses, discuss the key conceptual and technological bottlenecks, and outline the short-term prospects for progress in this field. PMID:23847628
Carving a niche: establishing bioinformatics collaborations
Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.
2006-01-01
Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668
A new genome-mining tool redefines the lasso peptide biosynthetic landscape
Tietz, Jonathan I.; Schwalen, Christopher J.; Patel, Parth S.; Maxson, Tucker; Blair, Patricia M.; Tai, Hua-Chia; Zakai, Uzma I.; Mitchell, Douglas A.
2016-01-01
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden Markov model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physiochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides, and more broadly, provide a framework for future genome-mining efforts. PMID:28244986
Xie, Bing; Huang, Yu; Baumann, Kate; Fry, Bryan Grieg; Shi, Qiong
2017-01-01
The potential of marine natural products to become new drugs is vast; however, research is still in its infancy. The chemical and biological diversity of marine toxins is immeasurable and as such an extraordinary resource for the discovery of new drugs. With the rapid development of next-generation sequencing (NGS) and liquid chromatography–tandem mass spectrometry (LC-MS/MS), it has been much easier and faster to identify more toxins and predict their functions with bioinformatics pipelines, which pave the way for novel drug developments. Here we provide an overview of related bioinformatics pipelines that have been supported by a combination of transcriptomics and proteomics for identification and function prediction of novel marine toxins. PMID:28358320
Xie, Bing; Huang, Yu; Baumann, Kate; Fry, Bryan Grieg; Shi, Qiong
2017-03-30
The potential of marine natural products to become new drugs is vast; however, research is still in its infancy. The chemical and biological diversity of marine toxins is immeasurable and as such an extraordinary resource for the discovery of new drugs. With the rapid development of next-generation sequencing (NGS) and liquid chromatography-tandem mass spectrometry (LC-MS/MS), it has been much easier and faster to identify more toxins and predict their functions with bioinformatics pipelines, which pave the way for novel drug developments. Here we provide an overview of related bioinformatics pipelines that have been supported by a combination of transcriptomics and proteomics for identification and function prediction of novel marine toxins.
Morgnanesi, Dante; Heinrichs, Eric J; Mele, Anthony R; Wilkinson, Sean; Zhou, Suzanne; Kulp, John L
2015-11-01
Computational chemical biology, applied to research on hepatitis B virus (HBV), has two major branches: bioinformatics (statistical models) and first-principle methods (molecular physics). While bioinformatics focuses on statistical tools and biological databases, molecular physics uses mathematics and chemical theory to study the interactions of biomolecules. Three computational techniques most commonly used in HBV research are homology modeling, molecular docking, and molecular dynamics. Homology modeling is a computational simulation to predict protein structure and has been used to construct conformers of the viral polymerase (reverse transcriptase domain and RNase H domain) and the HBV X protein. Molecular docking is used to predict the most likely orientation of a ligand when it is bound to a protein, as well as determining an energy score of the docked conformation. Molecular dynamics is a simulation that analyzes biomolecule motions and determines conformation and stability patterns. All of these modeling techniques have aided in the understanding of resistance mutations on HBV non-nucleos(t)ide reverse-transcriptase inhibitor binding. Finally, bioinformatics can be used to study the DNA and RNA protein sequences of viruses to both analyze drug resistance and to genotype the viral genomes. Overall, with these techniques, and others, computational chemical biology is becoming more and more necessary in hepatitis B research. This article forms part of a symposium in Antiviral Research on "An unfinished story: from the discovery of the Australia antigen to the development of new curative therapies for hepatitis B." Copyright © 2015 Elsevier B.V. All rights reserved.
Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich
2017-04-16
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.
atBioNet--an integrated network analysis tool for genomics and biomarker discovery.
Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida
2012-07-20
Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
Computational intelligence techniques in bioinformatics.
Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I
2013-12-01
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.
Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens.
Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Defelipe, Lucas; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo; Turjanski, Adrián G; Fernández Do Porto, Darío
2018-01-04
Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
Colaprico, Antonio; Silva, Tiago C.; Olsen, Catharina; Garofano, Luciano; Cava, Claudia; Garolini, Davide; Sabedot, Thais S.; Malta, Tathiane M.; Pagnotta, Stefano M.; Castiglioni, Isabella; Ceccarelli, Michele; Bontempi, Gianluca; Noushmehr, Houtan
2016-01-01
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries. PMID:26704973
DASS: efficient discovery and p-value calculation of substructures in unordered data.
Hollunder, Jens; Friedel, Maik; Beyer, Andreas; Workman, Christopher T; Wilhelm, Thomas
2007-01-01
Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS
Suh, K. Stephen; Sarojini, Sreeja; Youssif, Maher; Nalley, Kip; Milinovikj, Natasha; Elloumi, Fathi; Russell, Steven; Pecora, Andrew; Schecter, Elyssa; Goy, Andre
2013-01-01
Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and “-omics” data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine. PMID:23818899
Knowledge Discovery from Databases: An Introductory Review.
ERIC Educational Resources Information Center
Vickery, Brian
1997-01-01
Introduces new procedures being used to extract knowledge from databases and discusses rationales for developing knowledge discovery methods. Methods are described for such techniques as classification, clustering, and the detection of deviations from pre-established norms. Examines potential uses of knowledge discovery in the information field.…
Mello, Luciane V; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan
2017-01-01
Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student-staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators' teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability.
Mello, Luciane V.; Tregilgas, Luke; Cowley, Gwen; Gupta, Anshul; Makki, Fatima; Jhutty, Anjeet; Shanmugasundram, Achchuthan
2017-01-01
Abstract Teaching bioinformatics is a longstanding challenge for educators who need to demonstrate to students how skills developed in the classroom may be applied to real world research. This study employed an action research methodology which utilised student–staff partnership and peer-learning. It was centred on the experiences of peer-facilitators, students who had previously taken a postgraduate bioinformatics module, and had applied knowledge and skills gained from it to their own research. It aimed to demonstrate to peer-receivers, current students, how bioinformatics could be used in their own research while developing peer-facilitators’ teaching and mentoring skills. This student-centred approach was well received by the peer-receivers, who claimed to have gained improved understanding of bioinformatics and its relevance to research. Equally, peer-facilitators also developed a better understanding of the subject and appreciated that the activity was a rare and invaluable opportunity to develop their teaching and mentoring skills, enhancing their employability. PMID:29098185
ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis
Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas
2016-01-01
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475
Bioinformatics for Exploration
NASA Technical Reports Server (NTRS)
Johnson, Kathy A.
2006-01-01
For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.
Current progress in Structure-Based Rational Drug Design marks a new mindset in drug discovery
Lounnas, Valère; Ritschel, Tina; Kelder, Jan; McGuire, Ross; Bywater, Robert P.; Foloppe, Nicolas
2013-01-01
The past decade has witnessed a paradigm shift in preclinical drug discovery with structure-based drug design (SBDD) making a comeback while high-throughput screening (HTS) methods have continued to generate disappointing results. There is a deficit of information between identified hits and the many criteria that must be fulfilled in parallel to convert them into preclinical candidates that have a real chance to become a drug. This gap can be bridged by investigating the interactions between the ligands and their receptors. Accurate calculations of the free energy of binding are still elusive; however progresses were made with respect to how one may deal with the versatile role of water. A corpus of knowledge combining X-ray structures, bioinformatics and molecular modeling techniques now allows drug designers to routinely produce receptor homology models of increasing quality. These models serve as a basis to establish and validate efficient rationales used to tailor and/or screen virtual libraries with enhanced chances of obtaining hits. Many case reports of successful SBDD show how synergy can be gained from the combined use of several techniques. The role of SBDD with respect to two different classes of widely investigated pharmaceutical targets: (a) protein kinases (PK) and (b) G-protein coupled receptors (GPCR) is discussed. Throughout these examples prototypical situations covering the current possibilities and limitations of SBDD are presented. PMID:24688704
DrugQuest - a text mining workflow for drug association discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis
2016-06-06
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
Drewes, Stephan; Straková, Petra; Drexler, Jan F; Jacob, Jens; Ulrich, Rainer G
2017-01-01
Rodents are distributed throughout the world and interact with humans in many ways. They provide vital ecosystem services, some species are useful models in biomedical research and some are held as pet animals. However, many rodent species can have adverse effects such as damage to crops and stored produce, and they are of health concern because of the transmission of pathogens to humans and livestock. The first rodent viruses were discovered by isolation approaches and resulted in break-through knowledge in immunology, molecular and cell biology, and cancer research. In addition to rodent-specific viruses, rodent-borne viruses are causing a large number of zoonotic diseases. Most prominent examples are reemerging outbreaks of human hemorrhagic fever disease cases caused by arena- and hantaviruses. In addition, rodents are reservoirs for vector-borne pathogens, such as tick-borne encephalitis virus and Borrelia spp., and may carry human pathogenic agents, but likely are not involved in their transmission to human. In our days, next-generation sequencing or high-throughput sequencing (HTS) is revolutionizing the speed of the discovery of novel viruses, but other molecular approaches, such as generic RT-PCR/PCR and rolling circle amplification techniques, contribute significantly to the rapidly ongoing process. However, the current knowledge still represents only the tip of the iceberg, when comparing the known human viruses to those known for rodents, the mammalian taxon with the largest species number. The diagnostic potential of HTS-based metagenomic approaches is illustrated by their use in the discovery and complete genome determination of novel borna- and adenoviruses as causative disease agents in squirrels. In conclusion, HTS, in combination with conventional RT-PCR/PCR-based approaches, resulted in a drastically increased knowledge of the diversity of rodent viruses. Future improvements of the used workflows, including bioinformatics analysis, will further enhance our knowledge and preparedness in case of the emergence of novel viruses. Classical virological and additional molecular approaches are needed for genome annotation and functional characterization of novel viruses, discovered by these technologies, and evaluation of their zoonotic potential. © 2017 Elsevier Inc. All rights reserved.
The relation between prior knowledge and students' collaborative discovery learning processes
NASA Astrophysics Data System (ADS)
Gijlers, Hannie; de Jong, Ton
2005-03-01
In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction with the environment was logged. Based on students' individual judgments of the truth-value and testability of a series of domain-specific propositions, a detailed description of the knowledge configuration for each dyad was created before they entered the learning environment. Qualitative analyses of two dialogues illustrated that prior knowledge influences the discovery learning processes, and knowledge development in a pair of students. Assessments of student and dyad definitional (domain-specific) knowledge, generic (mathematical and graph) knowledge, and generic (discovery) skills were related to the students' dialogue in different discovery learning processes. Results show that a high level of definitional prior knowledge is positively related to the proportion of communication regarding the interpretation of results. Heterogeneity with respect to generic prior knowledge was positively related to the number of utterances made in the discovery process categories hypotheses generation and experimentation. Results of the qualitative analyses indicated that collaboration between extremely heterogeneous dyads is difficult when the high achiever is not willing to scaffold information and work in the low achiever's zone of proximal development.
jORCA: easily integrating bioinformatics Web Services.
Martín-Requena, Victoria; Ríos, Javier; García, Maximiliano; Ramírez, Sergio; Trelles, Oswaldo
2010-02-15
Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various Web Services differ in their definition and invocation protocols, as well as their communication and data formats-and this presents a barrier to service interoperability. jORCA is a desktop client aimed at facilitating seamless integration of Web Services. It does so by making a uniform representation of the different web resources, supporting scalable service discovery, and automatic composition of workflows. Usability is at the top of the jORCA agenda; thus it is a highly customizable and extensible application that accommodates a broad range of user skills featuring double-click invocation of services in conjunction with advanced execution-control, on the fly data standardization, extensibility of viewer plug-ins, drag-and-drop editing capabilities, plus a file-based browsing style and organization of favourite tools. The integration of bioinformatics Web Services is made easier to support a wider range of users. .
An overview of bioinformatics methods for modeling biological pathways in yeast
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao
2016-01-01
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein–protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae. In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways in S. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. PMID:26476430
Takakusagi, Yoichi; Takakusagi, Kaori; Sugawara, Fumio; Sakaguchi, Kengo
2018-01-01
Identification of target proteins that directly bind to bioactive small molecule is of great interest in terms of clarifying the mode of action of the small molecule as well as elucidating the biological phenomena at the molecular level. Of the experimental technologies available, T7 phage display allows comprehensive screening of small molecule-recognizing amino acid sequence from the peptide libraries displayed on the T7 phage capsid. Here, we describe the T7 phage display strategy that is combined with quartz-crystal microbalance (QCM) biosensor for affinity selection platform and bioinformatics analysis for small molecule-recognizing short peptides. This method dramatically enhances efficacy and throughput of the screening for small molecule-recognizing amino acid sequences without repeated rounds of selection. Subsequent execution of bioinformatics programs allows combinatorial and comprehensive target protein discovery of small molecules with its binding site, regardless of protein sample insolubility, instability, or inaccessibility of the fixed small molecules to internally located binding site on larger target proteins when conventional proteomics approaches are used.
Bowdin, S C; Hayeems, R Z; Monfared, N; Cohn, R D; Meyn, M S
2016-01-01
Our increasing knowledge of how genomic variants affect human health and the falling costs of whole-genome sequencing are driving the development of individualized genomic medicine. This new clinical paradigm uses knowledge of an individual's genomic variants to anticipate, diagnose and manage disease. While individualized genetic medicine offers the promise of transformative change in health care, it forces us to reconsider existing ethical, scientific and clinical paradigms. The potential benefits of pre-symptomatic identification of at-risk individuals, improved diagnostics, individualized therapy, accurate prognosis and avoidance of adverse drug reactions coexist with the potential risks of uninterpretable results, psychological harm, outmoded counseling models and increased health care costs. Here we review the challenges, opportunities and limits of integrating genomic analysis into pediatric clinical practice and describe a model for implementing individualized genomic medicine. Our multidisciplinary team of bioinformaticians, health economists, health services and policy researchers, ethicists, geneticists, genetic counselors and clinicians has designed a 'Genome Clinic' research project that addresses multiple challenges in pediatric genomic medicine--ranging from development of bioinformatics tools for the clinical assessment of genomic variants and the discovery of disease genes to health policy inquiries, assessment of clinical care models, patient preference and the ethics of consent. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives
NASA Astrophysics Data System (ADS)
Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan
2015-03-01
Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.
Application of industrial scale genomics to discovery of therapeutic targets in heart failure.
Mehraban, F; Tomlinson, J E
2001-12-01
In recent years intense activity in both academic and industrial sectors has provided a wealth of information on the human genome with an associated impressive increase in the number of novel gene sequences deposited in sequence data repositories and patent applications. This genomic industrial revolution has transformed the way in which drug target discovery is now approached. In this article we discuss how various differential gene expression (DGE) technologies are being utilized for cardiovascular disease (CVD) drug target discovery. Other approaches such as sequencing cDNA from cardiovascular derived tissues and cells coupled with bioinformatic sequence analysis are used with the aim of identifying novel gene sequences that may be exploited towards target discovery. Additional leverage from gene sequence information is obtained through identification of polymorphisms that may confer disease susceptibility and/or affect drug responsiveness. Pharmacogenomic studies are described wherein gene expression-based techniques are used to evaluate drug response and/or efficacy. Industrial-scale genomics supports and addresses not only novel target gene discovery but also the burgeoning issues in pharmaceutical and clinical cardiovascular medicine relative to polymorphic gene responses.
Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl ES
2009-01-01
Background Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics. PMID:19732427
Wagener, Johannes; Spjuth, Ola; Willighagen, Egon L; Wikberg, Jarl E S
2009-09-04
Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.
Bloudoff, Kristjan; Schmeing, T Martin
2017-11-01
Nonribosomal peptide synthetases (NRPSs) are incredible macromolecular machines that produce a wide range of biologically- and therapeutically-relevant molecules. During synthesis, peptide elongation is performed by the condensation (C) domain, as it catalyzes amide bond formation between the nascent peptide and the amino acid it adds to the chain. Since their discovery more than two decades ago, C domains have been subject to extensive biochemical, bioinformatic, mutagenic, and structural analyses. They are composed of two lobes, each with homology to chloramphenicol acetyltransferase, have two binding sites for their two peptidyl carrier protein-bound ligands, and have an active site with conserved motif HHxxxDG located between the two lobes. This review discusses some of the important insights into the structure, catalytic mechanism, specificity, and gatekeeping functions of C domains revealed since their discovery. In addition, C domains are the archetypal members of the C domain superfamily, which includes several other members that also function as NRPS domains. The other family members can replace the C domain in NRP synthesis, can work in concert with a C domain, or can fulfill diverse and novel functions. These domains include the epimerization (E) domain, the heterocyclization (Cy) domain, the ester-bond forming C domain, the fungal NRPS terminal C domain (C T ), the β-lactam ring forming C domain, and the X domain. We also discuss structural and function insight into C, E, Cy, C T and X domains, to present a holistic overview of historical and current knowledge of the C domain superfamily. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
Current Advances on Virus Discovery and Diagnostic Role of Viral Metagenomics in Aquatic Organisms
Munang'andu, Hetron M.; Mugimba, Kizito K.; Byarugaba, Denis K.; Mutoloki, Stephen; Evensen, Øystein
2017-01-01
The global expansion of the aquaculture industry has brought with it a corresponding increase of novel viruses infecting different aquatic organisms. These emerging viral pathogens have proved to be a challenge to the use of traditional cell-cultures and immunoassays for identification of new viruses especially in situations where the novel viruses are unculturable and no antibodies exist for their identification. Viral metagenomics has the potential to identify novel viruses without prior knowledge of their genomic sequence data and may provide a solution for the study of unculturable viruses. This review provides a synopsis on the contribution of viral metagenomics to the discovery of viruses infecting different aquatic organisms as well as its potential role in viral diagnostics. High throughput Next Generation sequencing (NGS) and library construction used in metagenomic projects have simplified the task of generating complete viral genomes unlike the challenge faced in traditional methods that use multiple primers targeted at different segments and VPs to generate the entire genome of a novel virus. In terms of diagnostics, studies carried out this far show that viral metagenomics has the potential to serve as a multifaceted tool able to study and identify etiological agents of single infections, co-infections, tissue tropism, profiling viral infections of different aquatic organisms, epidemiological monitoring of disease prevalence, evolutionary phylogenetic analyses, and the study of genomic diversity in quasispecies viruses. With sequencing technologies and bioinformatics analytical tools becoming cheaper and easier, we anticipate that metagenomics will soon become a routine tool for the discovery, study, and identification of novel pathogens including viruses to enable timely disease control for emerging diseases in aquaculture. PMID:28382024
Information Fusion for Natural and Man-Made Disasters
2007-01-31
comprehensively large, and metaphysically accurate model of situations, through which specific tasks such as situation assessment, knowledge discovery , or the...significance” is always context specific. Event discovery is a very important element of the HLF process, which can lead to knowledge discovery about...expected, given the current state of knowledge . Examples of such behavior may include discovery of a new aggregate or situation, a specific pattern of
BIOINFORMATICS IN THE K-8 CLASSROOM: DESIGNING INNOVATIVE ACTIVITIES FOR TEACHER IMPLEMENTATION
Shuster, Michele; Claussen, Kira; Locke, Melly; Glazewski, Krista
2016-01-01
At the intersection of biology and computer science is the growing field of bioinformatics—the analysis of complex datasets of biological relevance. Despite the increasing importance of bioinformatics and associated practical applications, these are not standard topics in elementary and middle school classrooms. We report on a pilot project and its evolution to support implementation of bioinformatics-based activities in elementary and middle school classrooms. Specifically, we ultimately designed a multi-day summer teacher professional development workshop, in which teachers design innovative classroom activities. By focusing on teachers, our design leverages enhanced teacher knowledge and confidence to integrate innovative instructional materials into K-8 classrooms and contributes to capacity building in STEM instruction. PMID:27429860
Bioinformatics meets user-centred design: a perspective.
Pavelin, Katrina; Cham, Jennifer A; de Matos, Paula; Brooksbank, Cath; Cameron, Graham; Steinbeck, Christoph
2012-01-01
Designers have a saying that "the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years." It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI), and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD) strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.
A New System To Support Knowledge Discovery: Telemakus.
ERIC Educational Resources Information Center
Revere, Debra; Fuller, Sherrilynne S.; Bugni, Paul F.; Martin, George M.
2003-01-01
The Telemakus System builds on the areas of concept representation, schema theory, and information visualization to enhance knowledge discovery from scientific literature. This article describes the underlying theories and an overview of a working implementation designed to enhance the knowledge discovery process through retrieval, visual and…
Zoukhri, Driss; Rawe, Ian; Singh, Mabi; Brown, Ashley; Kublin, Claire L; Dawson, Kevin; Haddon, William F; White, Earl L; Hanley, Kathleen M; Tusé, Daniel; Malyj, Wasyl; Papas, Athena
2012-03-01
The purpose of the current study was to determine if saliva contains biomarkers that can be used as diagnostic tools for Sjögren's syndrome (SjS). Twenty seven SjS patients and 27 age-matched healthy controls were recruited for these studies. Unstimulated glandular saliva was collected from the Wharton's duct using a suction device. Two µl of salvia were processed for mass spectrometry analyses on a prOTOF 2000 matrix-assisted laser desorption/ionization orthogonal time of flight (MALDI O-TOF) mass spectrometer. Raw data were analyzed using bioinformatic tools to identify biomarkers. MALDI O-TOF MS analyses of saliva samples were highly reproducible and the mass spectra generated were very rich in peptides and peptide fragments in the 750-7,500 Da range. Data analysis using bioinformatic tools resulted in several classification models being built and several biomarkers identified. One model based on 7 putative biomarkers yielded a sensitivity of 97.5%, specificity of 97.8% and an accuracy of 97.6%. One biomarker was present only in SjS samples and was identified as a proteolytic peptide originating from human basic salivary proline-rich protein 3 precursor. We conclude that salivary biomarkers detected by high-resolution mass spectrometry coupled with powerful bioinformatic tools offer the potential to serve as diagnostic/prognostic tools for SjS.
Advances in genome-wide RNAi cellular screens: a case study using the Drosophila JAK/STAT pathway
2012-01-01
Background Genome-scale RNA-interference (RNAi) screens are becoming ever more common gene discovery tools. However, whilst every screen identifies interacting genes, less attention has been given to how factors such as library design and post-screening bioinformatics may be effecting the data generated. Results Here we present a new genome-wide RNAi screen of the Drosophila JAK/STAT signalling pathway undertaken in the Sheffield RNAi Screening Facility (SRSF). This screen was carried out using a second-generation, computationally optimised dsRNA library and analysed using current methods and bioinformatic tools. To examine advances in RNAi screening technology, we compare this screen to a biologically very similar screen undertaken in 2005 with a first-generation library. Both screens used the same cell line, reporters and experimental design, with the SRSF screen identifying 42 putative regulators of JAK/STAT signalling, 22 of which verified in a secondary screen and 16 verified with an independent probe design. Following reanalysis of the original screen data, comparisons of the two gene lists allows us to make estimates of false discovery rates in the SRSF data and to conduct an assessment of off-target effects (OTEs) associated with both libraries. We discuss the differences and similarities between the resulting data sets and examine the relative improvements in gene discovery protocols. Conclusions Our work represents one of the first direct comparisons between first- and second-generation libraries and shows that modern library designs together with methodological advances have had a significant influence on genome-scale RNAi screens. PMID:23006893
Decoding the complex genetic causes of heart diseases using systems biology.
Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K
2015-03-01
The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
Knowledge Discovery as an Aid to Organizational Creativity.
ERIC Educational Resources Information Center
Siau, Keng
2000-01-01
This article presents the concept of knowledge discovery, a process of searching for associations in large volumes of computer data, as an aid to creativity. It then discusses the various techniques in knowledge discovery. Mednick's associative theory of creative thought serves as the theoretical foundation for this research. (Contains…
2017-06-27
From - To) 05-27-2017 Final 17-03-2017 - 15-03-2018 4. TITLE AND SUBTITLE Sa. CONTRACT NUMBER FA2386-17-1-0102 Advances in Knowledge Discovery and...Springer; Switzerland. 14. ABSTRACT The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is a leading international conference...in the areas of knowledge discovery and data mining (KDD). We had three keynote speeches, delivered by Sang Cha from Seoul National University
Mitochondrial Calcium Transport in Trypanosomes
Docampo, Roberto; Vercesi, Anibal E.; Huang, Guozhong
2014-01-01
The biochemical peculiarities of trypanosomes were fundamental for the recent molecular identification of the long-sought channel involved in mitochondrial Ca2+ uptake, the mitochondrial Ca2+ uniporter or MCU. This discovery led to the finding of numerous regulators of the channel, which form a high molecular weight complex with MCU. Some of these regulators have been bioinformatically identified in trypanosomes, which are the first eukaryotic organisms described for which MCU is essential. In trypanosomes MCU is important for buffering cytosolic Ca2+ changes and for activation of the bioenergetics of the cells. Future work on this pathway in trypanosomes promises further insight into the biology of these fascinating eukaryotes, as well as the potential for novel target discovery. PMID:25218432
Drug discovery in the next millennium.
Ohlstein, E H; Ruffolo, R R; Elliott, J D
2000-01-01
Selection and validation of novel molecular targets have become of paramount importance in light of the plethora of new potential therapeutic drug targets that have emerged from human gene sequencing. In response to this revolution within the pharmaceutical industry, the development of high-throughput methods in both biology and chemistry has been necessitated. This review addresses these technological advances as well as several new areas that have been created by necessity to deal with this new paradigm, such as bioinformatics, cheminformatics, and functional genomics. With many of these key components of future drug discovery now in place, it is possible to map out a critical path for this process that will be used into the new millennium.
The Characterization of the Phlebotomus papatasi Transcriptome
2013-04-01
Computational identification of novel chitinase-like proteins in the Drosophila melanogaster genome . Bioinformatics. 2004; 20, no. 2:161–169. [PubMed: 14734306...discovery in organisms where sequencing the whole genome is not possible (Lindlof 2003), or in addition to genome information for more accurate gene...biology of these important vectors, and generate essential data for annotation of the newly sequenced phlebotomine sand fly genomes (McDowell et al
Leoni, Gabriele; De Poli, Andrea; Mardirossian, Mario; Gambato, Stefano; Florian, Fiorella; Venier, Paola; Wilson, Daniel N; Tossi, Alessandro; Pallavicini, Alberto; Gerdol, Marco
2017-08-22
The application of high-throughput sequencing technologies to non-model organisms has brought new opportunities for the identification of bioactive peptides from genomes and transcriptomes. From this point of view, marine invertebrates represent a potentially rich, yet largely unexplored resource for de novo discovery due to their adaptation to diverse challenging habitats. Bioinformatics analyses of available genomic and transcriptomic data allowed us to identify myticalins, a novel family of antimicrobial peptides (AMPs) from the mussel Mytilus galloprovincialis , and a similar family of AMPs from Modiolus spp., named modiocalins. Their coding sequence encompasses two conserved N-terminal (signal peptide) and C-terminal (propeptide) regions and a hypervariable central cationic region corresponding to the mature peptide. Myticalins are taxonomically restricted to Mytiloida and they can be classified into four subfamilies. These AMPs are subject to considerable interindividual sequence variability and possibly to presence/absence variation. Functional assays performed on selected members of this family indicate a remarkable tissue-specific expression (in gills) and broad spectrum of activity against both Gram-positive and Gram-negative bacteria. Overall, we present the first linear AMPs ever described in marine mussels and confirm the great potential of bioinformatics tools for the de novo discovery of bioactive peptides in non-model organisms.
A bioinformatics roadmap for the human vaccines project.
Scheuermann, Richard H; Sinkovits, Robert S; Schenkelberg, Theodore; Koff, Wayne C
2017-06-01
Biomedical research has become a data intensive science in which high throughput experimentation is producing comprehensive data about biological systems at an ever-increasing pace. The Human Vaccines Project is a new public-private partnership, with the goal of accelerating development of improved vaccines and immunotherapies for global infectious diseases and cancers by decoding the human immune system. To achieve its mission, the Project is developing a Bioinformatics Hub as an open-source, multidisciplinary effort with the overarching goal of providing an enabling infrastructure to support the data processing, analysis and knowledge extraction procedures required to translate high throughput, high complexity human immunology research data into biomedical knowledge, to determine the core principles driving specific and durable protective immune responses.
A note on the false discovery rate of novel peptides in proteogenomics.
Zhang, Kun; Fu, Yan; Zeng, Wen-Feng; He, Kun; Chi, Hao; Liu, Chao; Li, Yan-Chang; Gao, Yuan; Xu, Ping; He, Si-Min
2015-10-15
Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes. To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides. Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides. Experimental results on two real datasets of Escherichia coli and Mycobacterium tuberculosis support our conjecture. yfu@amss.ac.cn or xupingghy@gmail.com or smhe@ict.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Boesenbergia rotunda: From Ethnomedicine to Drug Discovery
Eng-Chong, Tan; Yean-Kee, Lee; Chin-Fei, Chee; Choon-Han, Heh; Sher-Ming, Wong; Li-Ping, Christina Thio; Gen-Teck, Foo; Khalid, Norzulaani; Abd Rahman, Noorsaadah; Karsani, Saiful Anuar; Othman, Shatrah; Othman, Rozana; Yusof, Rohana
2012-01-01
Boesenbergia rotunda is a herb from the Boesenbergia genera under the Zingiberaceae family. B. rotunda is widely found in Asian countries where it is commonly used as a food ingredient and in ethnomedicinal preparations. The popularity of its ethnomedicinal usage has drawn the attention of scientists worldwide to further investigate its medicinal properties. Advancement in drug design and discovery research has led to the development of synthetic drugs from B. rotunda metabolites via bioinformatics and medicinal chemistry studies. Furthermore, with the advent of genomics, transcriptomics, proteomics, and metabolomics, new insights on the biosynthetic pathways of B. rotunda metabolites can be elucidated, enabling researchers to predict the potential bioactive compounds responsible for the medicinal properties of the plant. The vast biological activities exhibited by the compounds obtained from B. rotunda warrant further investigation through studies such as drug discovery, polypharmacology, and drug delivery using nanotechnology. PMID:23243448
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Discovery of a widely distributed toxin biosynthetic gene cluster
Lee, Shaun W.; Mitchell, Douglas A.; Markley, Andrew L.; Hensler, Mary E.; Gonzalez, David; Wohlrab, Aaron; Dorrestein, Pieter C.; Nizet, Victor; Dixon, Jack E.
2008-01-01
Bacteriocins represent a large family of ribosomally produced peptide antibiotics. Here we describe the discovery of a widely conserved biosynthetic gene cluster for the synthesis of thiazole and oxazole heterocycles on ribosomally produced peptides. These clusters encode a toxin precursor and all necessary proteins for toxin maturation and export. Using the toxin precursor peptide and heterocycle-forming synthetase proteins from the human pathogen Streptococcus pyogenes, we demonstrate the in vitro reconstitution of streptolysin S activity. We provide evidence that the synthetase enzymes, as predicted from our bioinformatics analysis, introduce heterocycles onto precursor peptides, thereby providing molecular insight into the chemical structure of streptolysin S. Furthermore, our studies reveal that the synthetase exhibits relaxed substrate specificity and modifies toxin precursors from both related and distant species. Given our findings, it is likely that the discovery of similar peptidic toxins will rapidly expand to existing and emerging genomes. PMID:18375757
The Relation between Prior Knowledge and Students' Collaborative Discovery Learning Processes
ERIC Educational Resources Information Center
Gijlers, Hannie; de Jong, Ton
2005-01-01
In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction…
Unsupervised learning of natural languages
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-01-01
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-08-16
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
FCDD: A Database for Fruit Crops Diseases.
Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal
2014-01-01
Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/
Pretest/Posttest Plus Prompts: Tools for Research and Evaluation
ERIC Educational Resources Information Center
Herron, Sherry; Gopal, Tamilselvi
2012-01-01
We conducted a series of summer workshops on bioinformatics to increase educators' knowledge of this new field of inquiry with the assumption that their knowledge will, in turn, impact student achievement. The workshops incorporated experiential learning and self-reflection (Loucks-Horsley et al. 1998). Educators demonstrated significant increases…
Better cancer biomarker discovery through better study design.
Rundle, Andrew; Ahsan, Habibul; Vineis, Paolo
2012-12-01
High-throughput laboratory technologies coupled with sophisticated bioinformatics algorithms have tremendous potential for discovering novel biomarkers, or profiles of biomarkers, that could serve as predictors of disease risk, response to treatment or prognosis. We discuss methodological issues in wedding high-throughput approaches for biomarker discovery with the case-control study designs typically used in biomarker discovery studies, especially focusing on nested case-control designs. We review principles for nested case-control study design in relation to biomarker discovery studies and describe how the efficiency of biomarker discovery can be effected by study design choices. We develop a simulated prostate cancer cohort data set and a series of biomarker discovery case-control studies nested within the cohort to illustrate how study design choices can influence biomarker discovery process. Common elements of nested case-control design, incidence density sampling and matching of controls to cases are not typically factored correctly into biomarker discovery analyses, inducing bias in the discovery process. We illustrate how incidence density sampling and matching of controls to cases reduce the apparent specificity of truly valid biomarkers 'discovered' in a nested case-control study. We also propose and demonstrate a new case-control matching protocol, we call 'antimatching', that improves the efficiency of biomarker discovery studies. For a valid, but as yet undiscovered, biomarker(s) disjunctions between correctly designed epidemiologic studies and the practice of biomarker discovery reduce the likelihood that true biomarker(s) will be discovered and increases the false-positive discovery rate. © 2012 The Authors. European Journal of Clinical Investigation © 2012 Stichting European Society for Clinical Investigation Journal Foundation.
FALCON: a toolbox for the fast contextualization of logical networks
De Landtsheer, Sébastien; Trairatphisan, Panuwat; Lucarelli, Philippe; Sauter, Thomas
2017-01-01
Abstract Motivation Mathematical modelling of regulatory networks allows for the discovery of knowledge at the system level. However, existing modelling tools are often computation-heavy and do not offer intuitive ways to explore the model, to test hypotheses or to interpret the results biologically. Results We have developed a computational approach to contextualize logical models of regulatory networks with biological measurements based on a probabilistic description of rule-based interactions between the different molecules. Here, we propose a Matlab toolbox, FALCON, to automatically and efficiently build and contextualize networks, which includes a pipeline for conducting parameter analysis, knockouts and easy and fast model investigation. The contextualized models could then provide qualitative and quantitative information about the network and suggest hypotheses about biological processes. Availability and implementation FALCON is freely available for non-commercial users on GitHub under the GPLv3 licence. The toolbox, installation instructions, full documentation and test datasets are available at https://github.com/sysbiolux/FALCON. FALCON runs under Matlab (MathWorks) and requires the Optimization Toolbox. Contact thomas.sauter@uni.lu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28673016
FALCON: a toolbox for the fast contextualization of logical networks.
De Landtsheer, Sébastien; Trairatphisan, Panuwat; Lucarelli, Philippe; Sauter, Thomas
2017-11-01
Mathematical modelling of regulatory networks allows for the discovery of knowledge at the system level. However, existing modelling tools are often computation-heavy and do not offer intuitive ways to explore the model, to test hypotheses or to interpret the results biologically. We have developed a computational approach to contextualize logical models of regulatory networks with biological measurements based on a probabilistic description of rule-based interactions between the different molecules. Here, we propose a Matlab toolbox, FALCON, to automatically and efficiently build and contextualize networks, which includes a pipeline for conducting parameter analysis, knockouts and easy and fast model investigation. The contextualized models could then provide qualitative and quantitative information about the network and suggest hypotheses about biological processes. FALCON is freely available for non-commercial users on GitHub under the GPLv3 licence. The toolbox, installation instructions, full documentation and test datasets are available at https://github.com/sysbiolux/FALCON. FALCON runs under Matlab (MathWorks) and requires the Optimization Toolbox. thomas.sauter@uni.lu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Marine Metagenome as A Resource for Novel Enzymes.
Alma'abadi, Amani D; Gojobori, Takashi; Mineta, Katsuhiko
2015-10-01
More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Transmembrane peptides as sensors of the membrane physical state
NASA Astrophysics Data System (ADS)
Piotto, Stefano; Di Biasi, Luigi; Sessa, Lucia; Concilio, Simona
2018-05-01
Cell membranes are commonly considered fundamental structures having multiple roles such as confinement, storage of lipids, sustain and control of membrane proteins. In spite of their importance, many aspects remain unclear. The number of lipid types is orders of magnitude larger than the number of amino acids, and this compositional complexity is not clearly embedded in any membrane model. A diffused hypothesis is that the large lipid palette permits to recruit and organize specific proteins controlling the formation of specialized lipid domains and the lateral pressure profile of the bilayer. Unfortunately, a satisfactory knowledge of lipid abundance remains utopian because of the technical difficulties in isolating definite membrane regions. More importantly, a theoretical framework where to fit the lipidomic data is still missing. In this work, we wish to utilize the amino acid sequence and frequency of the membrane proteins as bioinformatics sensors of cell bilayers. The use of an alignment-free method to find a correlation between the sequences of transmembrane portion of membrane proteins with the membrane physical state suggested a new approach for the discovery of antimicrobial peptides.
2013-01-01
Parasitic nematodes (roundworms) of small ruminants and other livestock have major economic impacts worldwide. Despite the impact of the diseases caused by these nematodes and the discovery of new therapeutic agents (anthelmintics), there has been relatively limited progress in the development of practical molecular tools to study the epidemiology of these nematodes. Specific diagnosis underpins parasite control, and the detection and monitoring of anthelmintic resistance in livestock parasites, presently a major concern around the world. The purpose of the present article is to provide a concise account of the biology and knowledge of the epidemiology of the gastrointestinal nematodes (order Strongylida), from an Australian perspective, and to emphasize the importance of utilizing advanced molecular tools for the specific diagnosis of nematode infections for refined investigations of parasite epidemiology and drug resistance detection in combination with conventional methods. It also gives a perspective on the possibility of harnessing genetic, genomic and bioinformatic technologies to better understand parasites and control parasitic diseases. PMID:23711194
Moulos, Panagiotis; Samiotaki, Martina; Panayotou, George; Dedos, Skarlatos G.
2016-01-01
The cells of prothoracic glands (PG) are the main site of synthesis and secretion of ecdysteroids, the biochemical products of cholesterol conversion to steroids that shape the morphogenic development of insects. Despite the availability of genome sequences from several insect species and the extensive knowledge of certain signalling pathways that underpin ecdysteroidogenesis, the spectrum of signalling molecules and ecdysteroidogenic cascades is still not fully comprehensive. To fill this gap and obtain the complete list of cell membrane receptors expressed in PG cells, we used combinatory bioinformatic, proteomic and transcriptomic analysis and quantitative PCR to annotate and determine the expression profiles of genes identified as putative cell membrane receptors of the model insect species, Bombyx mori, and subsequently enrich the repertoire of signalling pathways that are present in its PG cells. The genome annotation dataset we report here highlights modules and pathways that may be directly involved in ecdysteroidogenesis and aims to disseminate data and assist other researchers in the discovery of the role of such receptors and their ligands. PMID:27576083
Xu, Hai-Yu; Liu, Zhen-Ming; Fu, Yan; Zhang, Yan-Qiong; Yu, Jian-Jun; Guo, Fei-Fei; Tang, Shi-Huan; Lv, Chuan-Yu; Su, Jin; Cui, Ru-Yi; Yang, Hong-Jun
2017-09-01
Recently, integrative pharmacology(IP) has become a pivotal paradigm for the modernization of traditional Chinese medicines(TCM) and combinatorial drugs discovery, which is an interdisciplinary science for establishing the in vitro and in vivo correlation between absorption, distribution, metabolism, and excretion/pharmacokinetic(ADME/PK) profiles of TCM and the molecular networks of disease by the integration of the knowledge of multi-disciplinary and multi-stages. In the present study, an internet-based Computation Platform for IP of TCM(TCM-IP, www.tcmip.cn) is established to promote the development of the emerging discipline. Among them, a big data of TCM is an important resource for TCM-IP including Chinese Medicine Formula Database, Chinese Medical Herbs Database, Chemical Database of Chinese Medicine, Target Database for Disease and Symptoms, et al. Meanwhile, some data mining and bioinformatics approaches are critical technology for TCM-IP including the identification of the TCM constituents, ADME prediction, target prediction for the TCM constituents, network construction and analysis, et al. Furthermore, network beautification and individuation design are employed to meet the consumer's requirement. We firmly believe that TCM-IP is a very useful tool for the identification of active constituents of TCM and their involving potential molecular mechanism for therapeutics, which would wildly applied in quality evaluation, clinical repositioning, scientific discovery based on original thinking, prescription compatibility and new drug of TCM, et al. Copyright© by the Chinese Pharmaceutical Association.
BEAM web server: a tool for structural RNA motif discovery.
Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2018-03-15
RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
The Topology Prediction of Membrane Proteins: A Web-Based Tutorial.
Kandemir-Cavas, Cagin; Cavas, Levent; Alyuruk, Hakan
2018-06-01
There is a great need for development of educational materials on the transfer of current bioinformatics knowledge to undergraduate students in bioscience departments. In this study, it is aimed to prepare an example in silico laboratory tutorial on the topology prediction of membrane proteins by bioinformatics tools. This laboratory tutorial is prepared for biochemistry lessons at bioscience departments (biology, chemistry, biochemistry, molecular biology and genetics, and faculty of medicine). The tutorial is intended for students who have not taken a bioinformatics course yet or already have taken a course as an introduction to bioinformatics. The tutorial is based on step-by-step explanations with illustrations. It can be applied under supervision of an instructor in the lessons, or it can be used as a self-study guide by students. In the tutorial, membrane-spanning regions and α-helices of membrane proteins were predicted by internet-based bioinformatics tools. According to the results achieved from internet-based bioinformatics tools, the algorithms and parameters used were effective on the accuracy of prediction. The importance of this laboratory tutorial lies on the facts that it provides an introduction to the bioinformatics and that it also demonstrates an in silico laboratory application to the students at natural sciences. The presented example education material is applicable easily at all departments that have internet connection. This study presents an alternative education material to the students in biochemistry laboratories in addition to classical laboratory experiments.
SemaTyP: a knowledge graph based literature mining method for drug discovery.
Sang, Shengtian; Yang, Zhihao; Wang, Lei; Liu, Xiaoxia; Lin, Hongfei; Wang, Jian
2018-05-30
Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.
An overview of bioinformatics methods for modeling biological pathways in yeast.
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao; Cheng, Jianlin
2016-03-01
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Application of bioinformatics in chronobiology research.
Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía
2013-01-01
Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through "omics" projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.
Apparently low reproducibility of true differential expression discoveries in microarray studies.
Zhang, Min; Yao, Chen; Guo, Zheng; Zou, Jinfeng; Zhang, Lin; Xiao, Hui; Wang, Dong; Yang, Da; Gong, Xue; Zhu, Jing; Li, Yanhui; Li, Xia
2008-09-15
Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.
A New Student Performance Analysing System Using Knowledge Discovery in Higher Educational Databases
ERIC Educational Resources Information Center
Guruler, Huseyin; Istanbullu, Ayhan; Karahasan, Mehmet
2010-01-01
Knowledge discovery is a wide ranged process including data mining, which is used to find out meaningful and useful patterns in large amounts of data. In order to explore the factors having impact on the success of university students, knowledge discovery software, called MUSKUP, has been developed and tested on student data. In this system a…
Opportunities at the Intersection of Bioinformatics and Health Informatics
Miller, Perry L.
2000-01-01
This paper provides a “viewpoint discussion” based on a presentation made to the 2000 Symposium of the American College of Medical Informatics. It discusses potential opportunities for researchers in health informatics to become involved in the rapidly growing field of bioinformatics, using the activities of the Yale Center for Medical Informatics as a case study. One set of opportunities occurs where bioinformatics research itself intersects with the clinical world. Examples include the correlations between individual genetic variation with clinical risk factors, disease presentation, and differential response to treatment; and the implications of including genetic test results in the patient record, which raises clinical decision support issues as well as legal and ethical issues. A second set of opportunities occurs where bioinformatics research can benefit from the technologic expertise and approaches that informaticians have used extensively in the clinical arena. Examples include database organization and knowledge representation, data mining, and modeling and simulation. Microarray technology is discussed as a specific potential area for collaboration. Related questions concern how best to establish collaborations with bioscientists so that the interests and needs of both sets of researchers can be met in a synergistic fashion, and the most appropriate home for bioinformatics in an academic medical center. PMID:10984461
Knowledge Discovery in Databases.
ERIC Educational Resources Information Center
Norton, M. Jay
1999-01-01
Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…
Efficient detection of differentially methylated regions using DiMmeR.
Almeida, Diogo; Skov, Ida; Silva, Artur; Vandin, Fabio; Tan, Qihua; Röttger, Richard; Baumbach, Jan
2017-02-15
Epigenome-wide association studies (EWAS) generate big epidemiological datasets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group. Many bioinformatics tools exist for raw data analysis. However, most of them require some knowledge in the programming language R, have no user interface, and do not offer all necessary steps to guide users from raw data all the way down to statistically significant differentially methylated regions (DMRs) and the associated genes. Here, we present DiMmeR (Discovery of Multiple Differentially Methylated Regions), the first free standalone software that interactively guides with a user-friendly graphical user interface (GUI) scientists the whole way through EWAS data analysis. It offers parallelized statistical methods for efficiently identifying DMRs in both Illumina 450K and 850K EPIC chip data. DiMmeR computes empirical P -values through randomization tests, even for big datasets of hundreds of patients and thousands of permutations within a few minutes on a standard desktop PC. It is independent of any third-party libraries, computes regression coefficients, P -values and empirical P -values, and it corrects for multiple testing. DiMmeR is publicly available at http://dimmer.compbio.sdu.dk . diogoma@bmb.sdu.dk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka
2015-11-01
This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety. 2015 FRAME.
Big Data Application in Biomedical Research and Health Care: A Literature Review.
Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing
2016-01-01
Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.
Automatically exposing OpenLifeData via SADI semantic Web Services.
González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D
2014-01-01
Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Integrated web visualizations for protein-protein interaction databases.
Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas
2015-06-16
Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.
de Andrade, Roberto R S; Vaslin, Maite F S
2014-03-07
Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.
2014-01-01
Background Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. Methods In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. Results The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. Conclusions SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. Availability and implementation SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/. PMID:24607237
Big Data Application in Biomedical Research and Health Care: A Literature Review
Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing
2016-01-01
Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care. PMID:26843812
Accessing and integrating data and knowledge for biomedical research.
Burgun, A; Bodenreider, O
2008-01-01
To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
Using the iPlant collaborative discovery environment.
Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J
2013-06-01
The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Genome-wide expression profiling in pediatric septic shock
Wong, Hector R.
2013-01-01
For nearly a decade, our research group has had the privilege of developing and mining a multi-center, microarray-based, genome-wide expression database of critically ill children (≤ 10 years of age) with septic shock. Using bioinformatic and systems biology approaches, the expression data generated through this discovery-oriented, exploratory approach have been leveraged for a variety of objectives, which will be reviewed. Fundamental observations include wide spread repression of gene programs corresponding to the adaptive immune system, and biologically significant differential patterns of gene expression across developmental age groups. The data have also identified gene expression-based subclasses of pediatric septic shock having clinically relevant phenotypic differences. The data have also been leveraged for the discovery of novel therapeutic targets, and for the discovery and development of novel stratification and diagnostic biomarkers. Almost a decade of genome-wide expression profiling in pediatric septic shock is now demonstrating tangible results. The studies have progressed from an initial discovery-oriented and exploratory phase, to a new phase where the data are being translated and applied to address several areas of clinical need. PMID:23329198
Antanaviciute, Agne; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Carr, Ian M
2015-12-01
Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype.We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp. Supplementary data are available at Bioinformatics online. umaan@leeds.ac.uk. © The Author 2015. Published by Oxford University Press.
Manning, Timmy; Sleator, Roy D; Walsh, Paul
2014-01-01
Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems.
Advances in Omics and Bioinformatics Tools for Systems Analyses of Plant Functions
Mochida, Keiichi; Shinozaki, Kazuo
2011-01-01
Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances. PMID:22156726
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.
Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu
2015-01-01
The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
Exploring Wound-Healing Genomic Machinery with a Network-Based Approach
Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo
2017-01-01
The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674
Zhang, Ying; Wang, Xi; Cui, Dan; Zhu, Jun
2016-12-01
Human whole saliva is a vital body fluid for studying the physiology and pathology of the oral cavity. As a powerful technique for biomarker discovery, MS-based proteomic strategies have been introduced for saliva analysis and identified hundreds of proteins and N-glycosylation sites. However, there is still a lack of quantitative analysis, which is necessary for biomarker screening and biological research. In this study, we establish an integrated workflow by the combination of stable isotope dimethyl labeling, HILIC enrichment, and high resolution MS for both quantification of the global proteome and N-glycoproteome of human saliva from oral ulcer patients. With the help of advanced bioinformatics, we comprehensively studied oral ulcers at both protein and glycoprotein scales. Bioinformatics analyses revealed that starch digestion and protein degradation activities are inhibited while the immune response is promoted in oral ulcer saliva. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Pineda, Sandy S; Chaumeil, Pierre-Alain; Kunert, Anne; Kaas, Quentin; Thang, Mike W C; Le, Lien; Nuhn, Michael; Herzig, Volker; Saez, Natalie J; Cristofori-Armstrong, Ben; Anangi, Raveendra; Senff, Sebastian; Gorse, Dominique; King, Glenn F
2018-03-15
ArachnoServer is a manually curated database that consolidates information on the sequence, structure, function and pharmacology of spider-venom toxins. Although spider venoms are complex chemical arsenals, the primary constituents are small disulfide-bridged peptides that target neuronal ion channels and receptors. Due to their high potency and selectivity, these peptides have been developed as pharmacological tools, bioinsecticides and drug leads. A new version of ArachnoServer (v3.0) has been developed that includes a bioinformatics pipeline for automated detection and analysis of peptide toxin transcripts in assembled venom-gland transcriptomes. ArachnoServer v3.0 was updated with the latest sequence, structure and functional data, the search-by-mass feature has been enhanced, and toxin cards provide additional information about each mature toxin. http://arachnoserver.org. support@arachnoserver.org. Supplementary data are available at Bioinformatics online.
Chondrocyte channel transcriptomics
Lewis, Rebecca; May, Hannah; Mobasheri, Ali; Barrett-Jolley, Richard
2013-01-01
To date, a range of ion channels have been identified in chondrocytes using a number of different techniques, predominantly electrophysiological and/or biomolecular; each of these has its advantages and disadvantages. Here we aim to compare and contrast the data available from biophysical and microarray experiments. This letter analyses recent transcriptomics datasets from chondrocytes, accessible from the European Bioinformatics Institute (EBI). We discuss whether such bioinformatic analysis of microarray datasets can potentially accelerate identification and discovery of ion channels in chondrocytes. The ion channels which appear most frequently across these microarray datasets are discussed, along with their possible functions. We discuss whether functional or protein data exist which support the microarray data. A microarray experiment comparing gene expression in osteoarthritis and healthy cartilage is also discussed and we verify the differential expression of 2 of these genes, namely the genes encoding large calcium-activated potassium (BK) and aquaporin channels. PMID:23995703
Niche metabolism in parasitic protozoa
Ginger, Michael L
2005-01-01
Complete or partial genome sequences have recently become available for several medically and evolutionarily important parasitic protozoa. Through the application of bioinformatics complete metabolic repertoires for these parasites can be predicted. For experimentally intractable parasites insight provided by metabolic maps generated in silico has been startling. At its more extreme end, such bioinformatics reckoning facilitated the discovery in some parasites of mitochondria remodelled beyond previous recognition, and the identification of a non-photosynthetic chloroplast relic in malarial parasites. However, for experimentally tractable parasites, mapping of the general metabolic terrain is only a first step in understanding how the parasite modulates its streamlined, yet still often puzzlingly complex, metabolism in order to complete life cycles within host, vector, or environment. This review provides a comparative overview and discussion of metabolic strategies used by several different parasitic protozoa in order to subvert and survive host defences, and illustrates how genomic data contribute to the elucidation of parasite metabolism. PMID:16553311
A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research
Campbell, Chad E.; Nehm, Ross H.
2013-01-01
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students’ knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE. PMID:24006400
A Knowledge Discovery framework for Planetary Defense
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C. P.; Li, Y.; Yu, M.; Bambacus, M.; Seery, B.; Barbee, B.
2016-12-01
Planetary Defense, a project funded by NASA Goddard and the NSF, is a multi-faceted effort focused on the mitigation of Near Earth Object (NEO) threats to our planet. Currently, there exists a dispersion of information concerning NEO's amongst different organizations and scientists, leading to a lack of a coherent system of information to be used for efficient NEO mitigation. In this paper, a planetary defense knowledge discovery engine is proposed to better assist the development and integration of a NEO responding system. Specifically, we have implemented an organized information framework by two means: 1) the development of a semantic knowledge base, which provides a structure for relevant information. It has been developed by the implementation of web crawling and natural language processing techniques, which allows us to collect and store the most relevant structured information on a regular basis. 2) the development of a knowledge discovery engine, which allows for the efficient retrieval of information from our knowledge base. The knowledge discovery engine has been built on the top of Elasticsearch, an open source full-text search engine, as well as cutting-edge machine learning ranking and recommendation algorithms. This proposed framework is expected to advance the knowledge discovery and innovation in planetary science domain.
Translational Research 2.0: a framework for accelerating collaborative discovery.
Asakiewicz, Chris
2014-05-01
The world wide web has revolutionized the conduct of global, cross-disciplinary research. In the life sciences, interdisciplinary approaches to problem solving and collaboration are becoming increasingly important in facilitating knowledge discovery and integration. Web 2.0 technologies promise to have a profound impact - enabling reproducibility, aiding in discovery, and accelerating and transforming medical and healthcare research across the healthcare ecosystem. However, knowledge integration and discovery require a consistent foundation upon which to operate. A foundation should be capable of addressing some of the critical issues associated with how research is conducted within the ecosystem today and how it should be conducted for the future. This article will discuss a framework for enhancing collaborative knowledge discovery across the medical and healthcare research ecosystem. A framework that could serve as a foundation upon which ecosystem stakeholders can enhance the way data, information and knowledge is created, shared and used to accelerate the translation of knowledge from one area of the ecosystem to another.
Kearse, Matthew; Moir, Richard; Wilson, Amy; Stones-Havas, Steven; Cheung, Matthew; Sturrock, Shane; Buxton, Simon; Cooper, Alex; Markowitz, Sidney; Duran, Chris; Thierer, Tobias; Ashton, Bruce; Meintjes, Peter; Drummond, Alexei
2012-01-01
Summary: The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data. Availability and implementation: Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl. Contact: peter@biomatters.com PMID:22543367
Kearse, Matthew; Moir, Richard; Wilson, Amy; Stones-Havas, Steven; Cheung, Matthew; Sturrock, Shane; Buxton, Simon; Cooper, Alex; Markowitz, Sidney; Duran, Chris; Thierer, Tobias; Ashton, Bruce; Meintjes, Peter; Drummond, Alexei
2012-06-15
The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data. Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl.
Microbial bioinformatics for food safety and production
Alkema, Wynand; Boekhorst, Jos; Wels, Michiel
2016-01-01
In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. PMID:26082168
Assessment of composite motif discovery methods.
Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn
2008-02-26
Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.
Knowledge Discovery from Biomedical Ontologies in Cross Domains.
Shen, Feichen; Lee, Yugyung
2016-01-01
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.
Knowledge Discovery from Biomedical Ontologies in Cross Domains
Shen, Feichen; Lee, Yugyung
2016-01-01
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies. PMID:27548262
Knowledge discovery with classification rules in a cardiovascular dataset.
Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan
2005-12-01
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.
Progress in Biomedical Knowledge Discovery: A 25-year Retrospective
Sacchi, L.
2016-01-01
Summary Objectives We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. Methods We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. Results A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Conclusions Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data. PMID:27488403
Progress in Biomedical Knowledge Discovery: A 25-year Retrospective.
Sacchi, L; Holmes, J H
2016-08-02
We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992- 2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.
Communication in Collaborative Discovery Learning
ERIC Educational Resources Information Center
Saab, Nadira; van Joolingen, Wouter R.; van Hout-Wolters, Bernadette H. A. M.
2005-01-01
Background: Constructivist approaches to learning focus on learning environments in which students have the opportunity to construct knowledge themselves, and negotiate this knowledge with others. "Discovery learning" and "collaborative learning" are examples of learning contexts that cater for knowledge construction processes. We introduce a…
Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework
Lucero, Robert J.; Bakken, Suzanne
2014-01-01
Electronic health information systems can increase the ability of health-care organizations to investigate the effects of clinical interventions. The authors present an organizing framework that integrates outcomes and informatics research paradigms to guide knowledge discovery in electronic clinical databases. They illustrate its application using the example of hospital acquired pressure ulcers (HAPU). The Knowledge Discovery through Informatics for Comparative Effectiveness Research (KDI-CER) framework was conceived as a heuristic to conceptualize study designs and address potential methodological limitations imposed by using a single research perspective. Advances in informatics research can play a complementary role in advancing the field of outcomes research including CER. The KDI-CER framework can be used to facilitate knowledge discovery from routinely collected electronic clinical data. PMID:25278645
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Methods for Discovery of Novel Cellulosomal Cellulases Using Genomics and Biochemical Tools.
Ben-David, Yonit; Dassa, Bareket; Bensoussan, Lizi; Bayer, Edward A; Moraïs, Sarah
2018-01-01
Cell wall degradation by cellulases is extensively explored owing to its potential contribution to biofuel production. The cellulosome is an extracellular multienzyme complex that can degrade the plant cell wall very efficiently, and cellulosomal enzymes are therefore of great interest. The cellulosomal cellulases are defined as enzymes that contain a dockerin module, which can interact with a cohesin module contained in multiple copies in a noncatalytic protein, termed scaffoldin. The assembly of the cellulosomal cellulases into the cellulosomal complex occurs via specific protein-protein interactions. Cellulosome systems have been described initially only in several anaerobic cellulolytic bacteria. However, owing to ongoing genome sequencing and metagenomic projects, the discovery of novel cellulosome-producing bacteria and the description of their cellulosomal genes have dramatically increased in the recent years. In this chapter, methods for discovery of novel cellulosomal cellulases from a DNA sequence by bioinformatics and biochemical tools are described. Their biochemical characterization is also described, including both the enzymatic activity of the putative cellulases and their assembly into mature designer cellulosomes.
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...
2015-04-09
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Something old, something new: revisiting natural products in antibiotic drug discovery.
Wright, Gerard D
2014-03-01
Antibiotic discovery is in crisis. Despite a growing need for new drugs resulting from the increasing number of multi-antibiotic-resistant pathogens, there have been only a handful of new antibiotics approved for clinical use in the past 2 decades. Faced with scientific, economic, and regulatory challenges, the pharmaceutical sector seems unable to respond to what has been called an "apocalyptic" threat. Natural products produced by bacteria and fungi are genetically encoded products of natural selection that have been the mainstay sources of the antibiotics in current clinical use. The pharmaceutical industry has largely abandoned these compounds in favor of large libraries of synthetic molecules because of difficulties in identifying new natural product antibiotics scaffolds. Advances in next-generation genome sequencing, bioinformatics, and analytical chemistry are combining to overcome barriers to natural products. Coupled with new strategies in antibiotic discovery, including inhibition of resistance, novel drug combinations, and new targets, natural products are poised for a renaissance to address what is a pressing health care crisis.
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.
2015-01-01
Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308
Soleilhac, Emmanuelle; Nadon, Robert; Lafanechere, Laurence
2010-02-01
Screening compounds with cell-based assays and microscopy image-based analysis is an approach currently favored for drug discovery. Because of its high information yield, the strategy is called high-content screening (HCS). This review covers the application of HCS in drug discovery and also in basic research of potential new pathways that can be targeted for treatment of pathophysiological diseases. HCS faces several challenges, however, including the extraction of pertinent information from the massive amount of data generated from images. Several proposed approaches to HCS data acquisition and analysis are reviewed. Different solutions from the fields of mathematics, bioinformatics and biotechnology are presented. Potential applications and limits of these recent technical developments are also discussed. HCS is a multidisciplinary and multistep approach for understanding the effects of compounds on biological processes at the cellular level. Reliable results depend on the quality of the overall process and require strong interdisciplinary collaborations.
Malin, Bradley; Carley, Kathleen
2007-01-01
The goal of this research is to learn how the editorial staffs of bioinformatics and medical informatics journals provide support for cross-community exposure. Models such as co-citation and co-author analysis measure the relationships between researchers; but they do not capture how environments that support knowledge transfer across communities are organized. In this paper, we propose a social network analysis model to study how editorial boards integrate researchers from disparate communities. We evaluate our model by building relational networks based on the editorial boards of approximately 40 journals that serve as research outlets in medical informatics and bioinformatics. We track the evolution of editorial relationships through a longitudinal investigation over the years 2000 through 2005. Our findings suggest that there are research journals that support the collocation of editorial board members from the bioinformatics and medical informatics communities. Network centrality metrics indicate that editorial board members are located in the intersection of the communities and that the number of individuals in the intersection is growing with time. Social network analysis methods provide insight into the relationships between the medical informatics and bioinformatics communities. The number of editorial board members facilitating the publication intersection of the communities has grown, but the intersection remains dependent on a small group of individuals and fragile.
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.« less
May, Jody Christopher; Gant-Branum, Randi Lee; McLean, John Allen
2016-06-01
Systems-wide molecular phenomics is rapidly expanding through technological advances in instrumentation and bioinformatics. Strategies such as structural mass spectrometry, which utilizes size and shape measurements with molecular weight, serve to characterize the sum of molecular expression in biological contexts, where broad-scale measurements are made that are interpreted through big data statistical techniques to reveal underlying patterns corresponding to phenotype. The data density, data dimensionality, data projection, and data interrogation are all critical aspects of these approaches to turn data into salient information. Untargeted molecular phenomics is already having a dramatic impact in discovery science from drug discovery to synthetic biology. It is evident that these emerging techniques will integrate closely in broad efforts aimed at precision medicine. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery.
Barnes, Michael R; Harland, Lee; Foord, Steven M; Hall, Matthew D; Dix, Ian; Thomas, Scott; Williams-Jones, Bryn I; Brouwer, Cory R
2009-09-01
Pharmaceutical research and development is facing substantial challenges that have prompted the industry to shift funding from early- to late-stage projects. Among the effects is a major change in the attitude of many companies to their internal bioinformatics resources: the focus has moved from the vigorous pursuit of intellectual property towards exploration of pre-competitive cross-industry collaborations and engagement with the public domain. High-quality, open and accessible data are the foundation of pre-competitive research, and strong public-private partnerships have considerable potential to enhance public data resources, which would benefit everyone engaged in drug discovery. In this article, we discuss the background to these changes and propose new areas of collaboration in computational biology and chemistry between the public domain and the pharmaceutical industry.
In the loop: promoter–enhancer interactions and bioinformatics
Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke
2016-01-01
Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731
p3d--Python module for structural bioinformatics.
Fufezan, Christian; Specht, Michael
2009-08-21
High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.
Crowdsourcing for bioinformatics
Good, Benjamin M.; Su, Andrew I.
2013-01-01
Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614
Accessing and Integrating Data and Knowledge for Biomedical Research
Burgun, A.; Bodenreider, O.
2008-01-01
Summary Objectives To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. Methods Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. Results New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. Conclusion As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research. PMID:18660883
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas
2014-01-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248
MOWServ: a web client for integration of bioinformatic resources
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo
2010-01-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794
An integrative computational approach for prioritization of genomic variants
Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng; ...
2014-12-15
An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidatemore » genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. This study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.« less
A bioinformatics expert system linking functional data to anatomical outcomes in limb regeneration
Lobo, Daniel; Feldman, Erica B.; Shah, Michelle; Malone, Taylor J.
2014-01-01
Abstract Amphibians and molting arthropods have the remarkable capacity to regenerate amputated limbs, as described by an extensive literature of experimental cuts, amputations, grafts, and molecular techniques. Despite a rich history of experimental effort, no comprehensive mechanistic model exists that can account for the pattern regulation observed in these experiments. While bioinformatics algorithms have revolutionized the study of signaling pathways, no such tools have heretofore been available to assist scientists in formulating testable models of large‐scale morphogenesis that match published data in the limb regeneration field. Major barriers to preventing an algorithmic approach are the lack of formal descriptions for experimental regenerative information and a repository to centralize storage and mining of functional data on limb regeneration. Establishing a new bioinformatics of shape would significantly accelerate the discovery of key insights into the mechanisms that implement complex regeneration. Here, we describe a novel mathematical ontology for limb regeneration to unambiguously encode phenotype, manipulation, and experiment data. Based on this formalism, we present the first centralized formal database of published limb regeneration experiments together with a user‐friendly expert system tool to facilitate its access and mining. These resources are freely available for the community and will assist both human biologists and artificial intelligence systems to discover testable, mechanistic models of limb regeneration. PMID:25729585
Big data for big questions: it is time for data analysts to act
Moscato, Pablo
2015-01-01
Pablo Moscato speaks to Francesca Lake, Managing Editor Australian Research Council Future Fellow Prof. Pablo Moscato was born in 1964 in La Plata, Argentina. Obtaining his B.Sc. in Physics at University of La Plata, his PhD was defended at UNICAMP, Brazil. While at the California Institute of Technology Concurrent Computation Program he developed, in collaboration with Michael Norman, the first application of a methodology later called ‘memetic algorithms’, which is now widely used internationally. He is the founding co-director of the Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine (CIBM) (2006–present) and the funding director of the Newcastle Bioinformatics Initiative (2002–2006) of The University of Newcastle (Australia). He is also Chief Investigator of the Australian Research Council Centre in Bioinformatics. He is one of Australia's most cited computer scientists. Over the past 7 years, he has introduced a unifying hallmark of cancer progression based on the changes of information theory quantifiers, and developed a novel mathematical model and an associated solution procedure based on combinatorial optimization techniques to identify drug combinations for cancer therapeutics. In addition, he has identified proteomic signatures to predict the clinical symptoms of Alzheimer's disease, among other ‘firsts’. He is a member of the Editorial Board of Future Science OA. PMID:28031895
Serial analysis of gene expression in a rat lung model of asthma.
Yin, Lei-Miao; Jiang, Gong-Hao; Wang, Yu; Wang, Yan; Liu, Yan-Yan; Jin, Wei-Rong; Zhang, Zen; Xu, Yu-Dong; Yang, Yong-Qing
2008-11-01
The pathogenesis and molecular mechanism underlying asthma remain undetermined. The purpose of this study was to identify genes and pathways involved in the early airway response (EAR) phase of asthma by using serial analysis of gene expression (SAGE). Two SAGE tag libraries of lung tissues derived from a rat model of asthma and controls were generated. Bioinformatic analyses were carried out using the Database for Annotation, Visualization and IntegratedDiscovery Functional Annotation Tool, Gene Ontology (GO) TreeMachine and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A total of 26 552 SAGE tags of asthmatic rat lung were obtained, of which 12 221 were unique tags. Of the unique tags, 55.5% were matched with known genes. By comparison of the two libraries, 186 differentially expressed tags (P < 0.05) were identified, of which 103 were upregulated and 83 were downregulated. Using the bioinformatic tools these genes were classified into 23 functional groups, 15 KEGG pathways and 37 enriched GO categories. The bioinformatic analyses of gene distribution, enriched categories and the involvement of specific pathways in the SAGE libraries have provided information on regulatory networks of the EAR phase of asthma. Analyses of the regulated genes of interest may inform new hypotheses, increase our understanding of the disease and provide a foundation for future research.
MOWServ: a web client for integration of bioinformatic resources.
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo
2010-07-01
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
1994-09-30
relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object
Drug target inference through pathway analysis of genomics data
Ma, Haisu; Zhao, Hongyu
2013-01-01
Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
ReGaTE: Registration of Galaxy Tools in Elixir
Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé
2017-01-01
Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE. PMID:28402416
Girardi, Dominic; Küng, Josef; Kleiser, Raimund; Sonnberger, Michael; Csillag, Doris; Trenkler, Johannes; Holzinger, Andreas
2016-09-01
Established process models for knowledge discovery find the domain-expert in a customer-like and supervising role. In the field of biomedical research, it is necessary to move the domain-experts into the center of this process with far-reaching consequences for both their research output and the process itself. In this paper, we revise the established process models for knowledge discovery and propose a new process model for domain-expert-driven interactive knowledge discovery. Furthermore, we present a research infrastructure which is adapted to this new process model and demonstrate how the domain-expert can be deeply integrated even into the highly complex data-mining process and data-exploration tasks. We evaluated this approach in the medical domain for the case of cerebral aneurysms research.
NEIBank: Genomics and bioinformatics resources for vision research
Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David
2008-01-01
NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525
Nagamani, S; Gaur, A S; Tanneeru, K; Muneeswaran, G; Madugula, S S; Consortium, Mpds; Druzhilovskiy, D; Poroikov, V V; Sastry, G N
2017-11-01
Molecular property diagnostic suite (MPDS) is a Galaxy-based open source drug discovery and development platform. MPDS web portals are designed for several diseases, such as tuberculosis, diabetes mellitus, and other metabolic disorders, specifically aimed to evaluate and estimate the drug-likeness of a given molecule. MPDS consists of three modules, namely data libraries, data processing, and data analysis tools which are configured and interconnected to assist drug discovery for specific diseases. The data library module encompasses vast information on chemical space, wherein the MPDS compound library comprises 110.31 million unique molecules generated from public domain databases. Every molecule is assigned with a unique ID and card, which provides complete information for the molecule. Some of the modules in the MPDS are specific to the diseases, while others are non-specific. Importantly, a suitably altered protocol can be effectively generated for another disease-specific MPDS web portal by modifying some of the modules. Thus, the MPDS suite of web portals shows great promise to emerge as disease-specific portals of great value, integrating chemoinformatics, bioinformatics, molecular modelling, and structure- and analogue-based drug discovery approaches.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Knowledge Discovery in Textual Documentation: Qualitative and Quantitative Analyses.
ERIC Educational Resources Information Center
Loh, Stanley; De Oliveira, Jose Palazzo M.; Gastal, Fabio Leite
2001-01-01
Presents an application of knowledge discovery in texts (KDT) concerning medical records of a psychiatric hospital. The approach helps physicians to extract knowledge about patients and diseases that may be used for epidemiological studies, for training professionals, and to support physicians to diagnose and evaluate diseases. (Author/AEF)
The functional therapeutic chemical classification system.
Croset, Samuel; Overington, John P; Rebholz-Schuhmann, Dietrich
2014-03-15
Drug repositioning is the discovery of new indications for compounds that have already been approved and used in a clinical setting. Recently, some computational approaches have been suggested to unveil new opportunities in a systematic fashion, by taking into consideration gene expression signatures or chemical features for instance. We present here a novel method based on knowledge integration using semantic technologies, to capture the functional role of approved chemical compounds. In order to computationally generate repositioning hypotheses, we used the Web Ontology Language to formally define the semantics of over 20 000 terms with axioms to correctly denote various modes of action (MoA). Based on an integration of public data, we have automatically assigned over a thousand of approved drugs into these MoA categories. The resulting new resource is called the Functional Therapeutic Chemical Classification System and was further evaluated against the content of the traditional Anatomical Therapeutic Chemical Classification System. We illustrate how the new classification can be used to generate drug repurposing hypotheses, using Alzheimers disease as a use-case. https://www.ebi.ac.uk/chembl/ftc; https://github.com/loopasam/ftc. croset@ebi.ac.uk Supplementary data are available at Bioinformatics online.
Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi
2014-01-01
With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
Alzheimer's disease in the omics era.
Sancesario, Giulia M; Bernardini, Sergio
2018-06-18
Recent progresses in high-throughput technologies have led to a new scenario in investigating pathologies, named the "Omics era", which integrate the opportunity to collect large amounts of data and information at the molecular and protein levels together with the development of novel computational and statistical tools that are able to analyze and filter such data. Subsequently, advances in genotyping arrays, next generation sequencing, mass spectrometry technology, and bioinformatics allowed for the simultaneous large-scale study of thousands of genes (genomics), epigenetics factors (epigenomics), RNA (transcriptomics), metabolites (metabolomics) and proteins(proteomics), with the possibility of integrating multiple types of omics data ("multi -omics"). All of these technological innovations have modified the approach to the study of complex diseases, such as Alzheimer's Disease (AD), thus representing a promising tool to investigate the relationship between several molecular pathways in AD as well as other pathologies. This review focuses on the current knowledge on the pathology of AD, the recent findings from Omics sciences, and the challenge of the use of Big Data. We then focus on future perspectives for Omics sciences, such as the discovery of novel diagnostic biomarkers or drugs. Copyright © 2018. Published by Elsevier Inc.
Becht, Etienne; Simoni, Yannick; Coustan-Smith, Elaine; Maximilien, Evrard; Cheng, Yang; Ng, Lai Guan; Campana, Dario; Newell, Evan
2018-06-21
Recent flow and mass cytometers generate datasets of dimensions 20 to 40 and a million single cells. From these, many tools facilitate the discovery of new cell populations associated with diseases or physiology. These new cell populations require the identification of new gating strategies, but gating strategies become exponentially more difficult to optimize when dimensionality increases. To facilitate this step, we developed Hypergate, an algorithm which given a cell population of interest identifies a gating strategy optimized for high yield and purity. Hypergate achieves higher yield and purity than human experts, Support Vector Machines and Random-Forests on public datasets. We use it to revisit some established gating strategies for the identification of innate lymphoid cells, which identifies concise and efficient strategies that allow gating these cells with fewer parameters but higher yield and purity than the current standards. For phenotypic description, Hypergate's outputs are consistent with fields' knowledge and sparser than those from a competing method. Hypergate is implemented in R and available on CRAN. The source code is published at http://github.com/ebecht/hypergate under an Open Source Initiative-compliant licence. Supplementary data are available at Bioinformatics online.
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.
Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J
2016-02-01
Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad
Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologiesmore » in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.« less
Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer
Arighi, Cecilia N.; Wu, Cathy H.
2015-01-01
Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge “maps” of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease. PMID:26509276
Chimusa, Emile R; Mbiyavanga, Mamana; Masilela, Velaphi; Kumuthini, Judit
2015-11-01
A shortage of practical skills and relevant expertise is possibly the primary obstacle to social upliftment and sustainable development in Africa. The "omics" fields, especially genomics, are increasingly dependent on the effective interpretation of large and complex sets of data. Despite abundant natural resources and population sizes comparable with many first-world countries from which talent could be drawn, countries in Africa still lag far behind the rest of the world in terms of specialized skills development. Moreover, there are serious concerns about disparities between countries within the continent. The multidisciplinary nature of the bioinformatics field, coupled with rare and depleting expertise, is a critical problem for the advancement of bioinformatics in Africa. We propose a formalized matchmaking system, which is aimed at reversing this trend, by introducing the Knowledge Transfer Programme (KTP). Instead of individual researchers travelling to other labs to learn, researchers with desirable skills are invited to join African research groups for six weeks to six months. Visiting researchers or trainers will pass on their expertise to multiple people simultaneously in their local environments, thus increasing the efficiency of knowledge transference. In return, visiting researchers have the opportunity to develop professional contacts, gain industry work experience, work with novel datasets, and strengthen and support their ongoing research. The KTP develops a network with a centralized hub through which groups and individuals are put into contact with one another and exchanges are facilitated by connecting both parties with potential funding sources. This is part of the PLOS Computational Biology Education collection.
Chen, Qianting; Dai, Congling; Zhang, Qianjun; Du, Juan; Li, Wen
2016-10-01
To study the prediction performance evaluation with five kinds of bioinformatics software (SIFT, PolyPhen2, MutationTaster, Provean, MutationAssessor). From own database for genetic mutations collected over the past five years, Chinese literature database, Human Gene Mutation Database, and dbSNP, 121 missense mutations confirmed by functional studies, and 121 missense mutations suspected to be pathogenic by pedigree analysis were used as positive gold standard, while 242 missense mutations with minor allele frequency (MAF)>5% in dominant hereditary diseases were used as negative gold standard. The selected mutations were predicted with the five software. Based on the results, the performance of the five software was evaluated for their sensitivity, specificity, positive predict value, false positive rate, negative predict value, false negative rate, false discovery rate, accuracy, and receiver operating characteristic curve (ROC). In terms of sensitivity, negative predictive value and false negative rate, the rank was MutationTaster, PolyPhen2, Provean, SIFT, and MutationAssessor. For specificity and false positive rate, the rank was MutationTaster, Provean, MutationAssessor, SIFT, and PolyPhen2. For positive predict value and false discovery rate, the rank was MutationTaster, Provean, MutationAssessor, PolyPhen2, and SIFT. For area under the ROC curve (AUC) and accuracy, the rank was MutationTaster, Provean, PolyPhen2, MutationAssessor, and SIFT. The prediction performance of software may be different when using different parameters. Among the five software, MutationTaster has the best prediction performance.
LAILAPS: the plant science search engine.
Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias
2015-01-01
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Yang, Jack Y; Yang, Mary Qu; Arabnia, Hamid R; Deng, Youping
2008-09-16
Supported by National Science Foundation (NSF), International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design and International Journal of Functional Informatics and Personalized Medicine, IEEE 7th Bioinformatics and Bioengineering attracted more than 600 papers and 500 researchers and medical doctors. It was the only synergistic inter/multidisciplinary IEEE conference with 24 Keynote Lectures, 7 Tutorials, 5 Cutting-Edge Research Workshops and 32 Scientific Sessions including 11 Special Research Interest Sessions that were designed dynamically at Harvard in response to the current research trends and advances. The committee was very grateful for the IEEE Plenary Keynote Lectures given by: Dr. A. Keith Dunker (Indiana), Dr. Jun Liu (Harvard), Dr. Brian Athey (Michigan), Dr. Mark Borodovsky (Georgia Tech and President of ISIBM), Dr. Hamid Arabnia (Georgia and Vice-President of ISIBM), Dr. Ruzena Bajcsy (Berkeley and Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Chih-Ming Ho (UCLA and Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Andy Baxevanis (United States National Institutes of Health), Dr. Arif Ghafoor (Purdue), Dr. John Quackenbush (Harvard), Dr. Eric Jakobsson (UIUC), Dr. Vladimir Uversky (Indiana), Dr. Laura Elnitski (United States National Institutes of Health) and other world-class scientific leaders. The Harvard meeting was a large academic event 100% full-sponsored by IEEE financially and academically. After a rigorous peer-review process, the committee selected 27 high-quality research papers from 600 submissions. The committee is grateful for contributions from keynote speakers Dr. Russ Altman (IEEE BIBM conference keynote lecturer on combining simulation and machine learning to recognize function in 4D), Dr. Mary Qu Yang (IEEE BIBM workshop keynote lecturer on new initiatives of detecting microscopic disease using machine learning and molecular biology, http://ieeexplore.ieee.org/servlet/opac?punumber=4425386) and Dr. Jack Y. Yang (IEEE BIBM workshop keynote lecturer on data mining and knowledge discovery in translational medicine) from the first IEEE Computer Society BioInformatics and BioMedicine (IEEE BIBM) international conference and workshops, November 2-4, 2007, Silicon Valley, California, USA.
Barrero, Roberto A; Napier, Kathryn R; Cunnington, James; Liefting, Lia; Keenan, Sandi; Frampton, Rebekah A; Szabo, Tamas; Bulman, Simon; Hunter, Adam; Ward, Lisa; Whattam, Mark; Bellgard, Matthew I
2017-01-11
Detection and preventing entry of exotic viruses and viroids at the border is critical for protecting plant industries trade worldwide. Existing post entry quarantine screening protocols rely on time-consuming biological indicators and/or molecular assays that require knowledge of infecting viral pathogens. Plants have developed the ability to recognise and respond to viral infections through Dicer-like enzymes that cleave viral sequences into specific small RNA products. Many studies reported the use of a broad range of small RNAs encompassing the product sizes of several Dicer enzymes involved in distinct biological pathways. Here we optimise the assembly of viral sequences by using specific small RNA subsets. We sequenced the small RNA fractions of 21 plants held at quarantine glasshouse facilities in Australia and New Zealand. Benchmarking of several de novo assembler tools yielded SPAdes using a kmer of 19 to produce the best assembly outcomes. We also found that de novo assembly using 21-25 nt small RNAs can result in chimeric assemblies of viral sequences and plant host sequences. Such non-specific assemblies can be resolved by using 21-22 nt or 24 nt small RNAs subsets. Among the 21 selected samples, we identified contigs with sequence similarity to 18 viruses and 3 viroids in 13 samples. Most of the viruses were assembled using only 21-22 nt long virus-derived siRNAs (viRNAs), except for one Citrus endogenous pararetrovirus that was more efficiently assembled using 24 nt long viRNAs. All three viroids found in this study were fully assembled using either 21-22 nt or 24 nt viRNAs. Optimised analysis workflows were customised within the Yabi web-based analytical environment. We present a fully automated viral surveillance and diagnosis web-based bioinformatics toolkit that provides a flexible, user-friendly, robust and scalable interface for the discovery and diagnosis of viral pathogens. We have implemented an automated viral surveillance and diagnosis (VSD) bioinformatics toolkit that produces improved viruses and viroid sequence assemblies. The VSD toolkit provides several optimised and reusable workflows applicable to distinct viral pathogens. We envisage that this resource will facilitate the surveillance and diagnosis viral pathogens in plants, insects and invertebrates.
2008-01-01
Supported by National Science Foundation (NSF), International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design and International Journal of Functional Informatics and Personalized Medicine, IEEE 7th Bioinformatics and Bioengineering attracted more than 600 papers and 500 researchers and medical doctors. It was the only synergistic inter/multidisciplinary IEEE conference with 24 Keynote Lectures, 7 Tutorials, 5 Cutting-Edge Research Workshops and 32 Scientific Sessions including 11 Special Research Interest Sessions that were designed dynamically at Harvard in response to the current research trends and advances. The committee was very grateful for the IEEE Plenary Keynote Lectures given by: Dr. A. Keith Dunker (Indiana), Dr. Jun Liu (Harvard), Dr. Brian Athey (Michigan), Dr. Mark Borodovsky (Georgia Tech and President of ISIBM), Dr. Hamid Arabnia (Georgia and Vice-President of ISIBM), Dr. Ruzena Bajcsy (Berkeley and Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Chih-Ming Ho (UCLA and Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Andy Baxevanis (United States National Institutes of Health), Dr. Arif Ghafoor (Purdue), Dr. John Quackenbush (Harvard), Dr. Eric Jakobsson (UIUC), Dr. Vladimir Uversky (Indiana), Dr. Laura Elnitski (United States National Institutes of Health) and other world-class scientific leaders. The Harvard meeting was a large academic event 100% full-sponsored by IEEE financially and academically. After a rigorous peer-review process, the committee selected 27 high-quality research papers from 600 submissions. The committee is grateful for contributions from keynote speakers Dr. Russ Altman (IEEE BIBM conference keynote lecturer on combining simulation and machine learning to recognize function in 4D), Dr. Mary Qu Yang (IEEE BIBM workshop keynote lecturer on new initiatives of detecting microscopic disease using machine learning and molecular biology, http://ieeexplore.ieee.org/servlet/opac?punumber=4425386) and Dr. Jack Y. Yang (IEEE BIBM workshop keynote lecturer on data mining and knowledge discovery in translational medicine) from the first IEEE Computer Society BioInformatics and BioMedicine (IEEE BIBM) international conference and workshops, November 2-4, 2007, Silicon Valley, California, USA. PMID:18831773
Villoutreix, B O
2016-07-01
Bioinformatics and chemoinformatics approaches contribute to the discovery of novel targets, chemical probes, hits, leads and medicinal drugs. A vast repertoire of computational methods has indeed been reported over the years and in this review, I will briefly introduce some concepts and approaches, namely the analysis of potential therapeutic target binding pockets, the preparation of compound collections and virtual screening. An example of application is provided for two proteins acting in the blood coagulation system. Overall, in silico methods have been shown to improve R and D productivity in both, academic settings and in the private sector, if they are integrated in a rational manner with experimental approaches. However, integration of tools and pluridisciplinarity are seldom achieved. Efforts should be done in this direction as pluridisciplinarity and a true acknowledgment of all the contributing actors along the value chain could enhance innovation and reduce skyrocketing costs. Copyright © 2016 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
A vision for collaborative training infrastructure for bioinformatics.
Williams, Jason J; Teal, Tracy K
2017-01-01
In biology, a missing link connecting data generation and data-driven discovery is the training that prepares researchers to effectively manage and analyze data. National and international cyberinfrastructure along with evolving private sector resources place biologists and students within reach of the tools needed for data-intensive biology, but training is still required to make effective use of them. In this concept paper, we review a number of opportunities and challenges that can inform the creation of a national bioinformatics training infrastructure capable of servicing the large number of emerging and existing life scientists. While college curricula are slower to adapt, grassroots startup-spirited organizations, such as Software and Data Carpentry, have made impressive inroads in training on the best practices of software use, development, and data analysis. Given the transformative potential of biology and medicine as full-fledged data sciences, more support is needed to organize, amplify, and assess these efforts and their impacts. © 2016 New York Academy of Sciences.
Malaria vaccines: high-throughput tools for antigens discovery with potential for their development
Céspedes, Nora; Vallejo, Andrés; Arévalo-Herrera, Myriam
2013-01-01
Malaria is a disease induced by parasites of the Plasmodium genus, which are transmitted by Anopheles mosquitoes and represents a great socio-economic burden Worldwide. Plasmodium vivax is the second species of malaria Worldwide, but it is the most prevalent in Latin America and other regions of the planet. It is currently considered that vaccines represent a cost-effective strategy for controlling transmissible diseases and could complement other malaria control measures; however, the chemical and immunological complexity of the parasite has hindered development of effective vaccines. Recent availability of several genomes of Plasmodium species, as well as bioinformatic tools are allowing the selection of large numbers of proteins and analysis of their immune potential. Herein, we review recently developed strategies for discovery of novel antigens with potential for malaria vaccine development. PMID:24892459
Cis-encoded non-coding antisense RNAs in streptococci and other low GC Gram (+) bacterial pathogens
Cho, Kyu Hong; Kim, Jeong-Ho
2015-01-01
Due to recent advances of bioinformatics and high throughput sequencing technology, discovery of regulatory non-coding RNAs in bacteria has been increased to a great extent. Based on this bandwagon, many studies searching for trans-acting small non-coding RNAs in streptococci have been performed intensively, especially in the important human pathogen, group A and B streptococci. However, studies for cis-encoded non-coding antisense RNAs in streptococci have been scarce. A recent study shows antisense RNAs are involved in virulence gene regulation in group B streptococcus, S. agalactiae. This suggests antisense RNAs could have important roles in the pathogenesis of streptococcal pathogens. In this review, we describe recent discoveries of chromosomal cis-encoded antisense RNAs in streptococcal pathogens and other low GC Gram (+) bacteria to provide a guide for future studies. PMID:25859258
Building Faculty Capacity through the Learning Sciences
ERIC Educational Resources Information Center
Moy, Elizabeth; O'Sullivan, Gerard; Terlecki, Melissa; Jernstedt, Christian
2014-01-01
Discoveries in the learning sciences (especially in neuroscience) have yielded a rich and growing body of knowledge about how students learn, yet this knowledge is only half of the story. The other half is "know how," i.e. the application of this knowledge. For faculty members, that means applying the discoveries of the learning sciences…
ISMB 2016 offers outstanding science, networking, and celebration
Fogg, Christiana
2016-01-01
The annual international conference on Intelligent Systems for Molecular Biology (ISMB) is the major meeting of the International Society for Computational Biology (ISCB). Over the past 23 years the ISMB conference has grown to become the world's largest bioinformatics/computational biology conference. ISMB 2016 will be the year's most important computational biology event globally. The conferences provide a multidisciplinary forum for disseminating the latest developments in bioinformatics/computational biology. ISMB brings together scientists from computer science, molecular biology, mathematics, statistics and related fields. Its principal focus is on the development and application of advanced computational methods for biological problems. ISMB 2016 offers the strongest scientific program and the broadest scope of any international bioinformatics/computational biology conference. Building on past successes, the conference is designed to cater to variety of disciplines within the bioinformatics/computational biology community. ISMB 2016 takes place July 8 - 12 at the Swan and Dolphin Hotel in Orlando, Florida, United States. For two days preceding the conference, additional opportunities including Satellite Meetings, Student Council Symposium, and a selection of Special Interest Group Meetings and Applied Knowledge Exchange Sessions (AKES) are all offered to enable registered participants to learn more on the latest methods and tools within specialty research areas. PMID:27347392
ISMB 2016 offers outstanding science, networking, and celebration.
Fogg, Christiana
2016-01-01
The annual international conference on Intelligent Systems for Molecular Biology (ISMB) is the major meeting of the International Society for Computational Biology (ISCB). Over the past 23 years the ISMB conference has grown to become the world's largest bioinformatics/computational biology conference. ISMB 2016 will be the year's most important computational biology event globally. The conferences provide a multidisciplinary forum for disseminating the latest developments in bioinformatics/computational biology. ISMB brings together scientists from computer science, molecular biology, mathematics, statistics and related fields. Its principal focus is on the development and application of advanced computational methods for biological problems. ISMB 2016 offers the strongest scientific program and the broadest scope of any international bioinformatics/computational biology conference. Building on past successes, the conference is designed to cater to variety of disciplines within the bioinformatics/computational biology community. ISMB 2016 takes place July 8 - 12 at the Swan and Dolphin Hotel in Orlando, Florida, United States. For two days preceding the conference, additional opportunities including Satellite Meetings, Student Council Symposium, and a selection of Special Interest Group Meetings and Applied Knowledge Exchange Sessions (AKES) are all offered to enable registered participants to learn more on the latest methods and tools within specialty research areas.
Search PNNL Home About Research Publications Jobs News Contacts Computational Biology and Bioinformatics , and engineering to transform the data into knowledge. This new quantitative, predictive biology is to empirical modeling and physics-based simulations. CBB research seeks to: Understand. Understanding
Application of Mechanistic Toxicology Data to Ecological Risk Assessments
The ongoing evolution of knowledge and tools in the areas of molecular biology, bioinformatics, and systems biology holds significant promise for reducing uncertainties associated with ecological risk assessment. As our understanding of the mechanistic basis of responses of organ...
Hypergravity Facilities: Extending Knowledge Over the Continuum of Gravity
NASA Technical Reports Server (NTRS)
Souza, Kenneth A.
1999-01-01
Historical perspectives, reasons for gravitational research, key questions regarding centrifuges, particular centrifuge discussions, vestibular research facilities, the hypergravity facility for cell culture, the human research facility, as well as the center for bioinformatics are all topics discussed in viewgraph form.
The center for causal discovery of biomedical knowledge from big data
Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-01-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. PMID:26138794
ReGaTE: Registration of Galaxy Tools in Elixir.
Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé
2017-06-01
Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE . © The Author 2017. Published by Oxford University Press.
The Effect of Rules and Discovery in the Retention and Retrieval of Braille Inkprint Letter Pairs.
ERIC Educational Resources Information Center
Nagengast, Daniel L.; And Others
The effects of rule knowledge were investigated using Braille inkprint pairs. Both recognition and recall were studied in three groups of subjects: rule knowledge, rule discovery, and no rule. Two hypotheses were tested: (1) that the group exposed to the rule would score better than would a discovery group and a control group; and (2) that all…
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.
Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo
Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.
Shi, Zheng; Yu, Tian; Sun, Rong; Wang, Shan; Chen, Xiao-Qian; Cheng, Li-Jia; Liu, Rong
2016-01-01
Human epidermal growth factor receptor-2 (HER2) is a trans-membrane receptor like protein, and aberrant signaling of HER2 is implicated in many human cancers, such as ovarian cancer, gastric cancer, and prostate cancer, most notably breast cancer. Moreover, it has been in the spotlight in the recent years as a promising new target for therapy of breast cancer. Since virtual screening has become an integral part of the drug discovery process, it is of great significant to identify novel HER2 inhibitors by structure-based virtual screening. In this study, we carried out a series of elegant bioinformatics approaches, such as virtual screening and molecular dynamics (MD) simulations to identify HER2 inhibitors from Food and Drug Administration-approved small molecule drug as potential "new use" drugs. Molecular docking identified top 10 potential drugs which showed spectrum affinity to HER2. Moreover, MD simulations suggested that ZINC08214629 (Nonoxynol-9) and ZINC03830276 (Benzonatate) might exert potential inhibitory effects against HER2-targeted anti-breast cancer therapeutics. Together, our findings may provide successful application of virtual screening studies in the lead discovery process, and suggest that our discovered small molecules could be effective HER2 inhibitor candidates for further study. A series of elegant bioinformatics approaches, including virtual screening and molecular dynamics (MD) simulations were took advantage to identify human epidermal growth factor receptor-2 (HER2) inhibitors. Molecular docking recognized top 10 candidate compounds, which showed spectrum affinity to HER2. Further, MD simulations suggested that ZINC08214629 (Nonoxynol-9) and ZINC03830276 (Benzonatate) in candidate compounds were identified as potential "new use" drugs against HER2-targeted anti-breast cancer therapeutics. Abbreviations used: HER2: Human epidermal growth factor receptor-2, FDA: Food and Drug Administration, PDB: Protein Database Bank, RMSDs: Root mean square deviations, SPC: Single point charge, PME: Particle mesh Ewald, NVT: Constant volume, NPT: Constant pressure, RMSF: Root-mean-square fluctuation.
2011-01-01
Background The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. Description SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. Conclusions SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies. PMID:22024447
Wilkinson, Mark D; Vandervalk, Benjamin; McCarthy, Luke
2011-10-24
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.
The re-emerging role of microbial natural products in antibiotic discovery.
Genilloud, Olga
2014-07-01
New classes of antibacterial compounds are urgently needed to respond to the high frequency of occurrence of resistances to all major classes of known antibiotics. Microbial natural products have been for decades one of the most successful sources of drugs to treat infectious diseases but today, the emerging unmet clinical need poses completely new challenges to the discovery of novel candidates with the desired properties to be developed as antibiotics. While natural products discovery programs have been gradually abandoned by the big pharma, smaller biotechnology companies and research organizations are taking over the lead in the discovery of novel antibacterials. Recent years have seen new approaches and technologies being developed and integrated in a multidisciplinary effort to further exploit microbial resources and their biosynthetic potential as an untapped source of novel molecules. New strategies to isolate novel species thought to be uncultivable, and synthetic biology approaches ranging from genome mining of microbial strains for cryptic biosynthetic pathways to their heterologous expression have been emerging in combination with high throughput sequencing platforms, integrated bioinformatic analysis, and on-site analytical detection and dereplication tools for novel compounds. These different innovative approaches are defining a completely new framework that is setting the bases for the future discovery of novel chemical scaffolds that should foster a renewed interest in the identification of novel classes of natural product antibiotics from the microbial world.
Bioinformatics by Example: From Sequence to Target
NASA Astrophysics Data System (ADS)
Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj
2002-12-01
With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Svedas, Vytas
2014-07-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure-function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
MAPI: towards the integrated exploitation of bioinformatics Web Services.
Ramirez, Sergio; Karlsson, Johan; Trelles, Oswaldo
2011-10-27
Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry.
Palmer, Andrew; Phapale, Prasad; Chernyavsky, Ilya; Lavigne, Regis; Fay, Dominik; Tarasov, Artem; Kovalev, Vitaly; Fuchser, Jens; Nikolenko, Sergey; Pineau, Charles; Becker, Michael; Alexandrov, Theodore
2017-01-01
High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM). We introduce a metabolite-signal match score and a target-decoy FDR estimate for spatial metabolomics.
NASA Astrophysics Data System (ADS)
Koseki, Jun; Matsui, Hidetoshi; Konno, Masamitsu; Nishida, Naohiro; Kawamoto, Koichi; Kano, Yoshihiro; Mori, Masaki; Doki, Yuichiro; Ishii, Hideshi
2016-02-01
Bioinformatics and computational modelling are expected to offer innovative approaches in human medical science. In the present study, we performed computational analyses and made predictions using transcriptome and metabolome datasets obtained from fluorescence-based visualisations of chemotherapy-resistant cancer stem cells (CSCs) in the human oesophagus. This approach revealed an uncharacterized role for the ornithine metabolic pathway in the survival of chemotherapy-resistant CSCs. The present study fastens this rationale for further characterisation that may lead to the discovery of innovative drugs against robust CSCs.
Liu, Zhandong; Zheng, W Jim; Allen, Genevera I; Liu, Yin; Ruan, Jianhua; Zhao, Zhongming
2017-10-03
The 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) was held on December 8-10, 2016 in Houston, Texas, USA. ICIBM included eight scientific sessions, four tutorials, one poster session, four highlighted talks and four keynotes that covered topics on 3D genomics structural analysis, next generation sequencing (NGS) analysis, computational drug discovery, medical informatics, cancer genomics, and systems biology. Here, we present a summary of the nine research articles selected from ICIBM 2016 program for publishing in BMC Bioinformatics.
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations
Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter
2017-01-01
Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594
Jimenez, Connie R; Piersma, Sander; Pham, Thang V
2007-12-01
Proteomics aims to create a link between genomic information, biological function and disease through global studies of protein expression, modification and protein-protein interactions. Recent advances in key proteomics tools, such as mass spectrometry (MS) and (bio)informatics, provide tremendous opportunities for biomarker-related clinical applications. In this review, we focus on two complementary MS-based approaches with high potential for the discovery of biomarker patterns and low-abundant candidate biomarkers in biofluids: high-throughput matrix-assisted laser desorption/ionization time-of-flight mass spectroscopy-based methods for peptidome profiling and label-free liquid chromatography-based methods coupled to MS for in-depth profiling of biofluids with a focus on subproteomes, including the low-molecular-weight proteome, carrier-bound proteome and N-linked glycoproteome. The two approaches differ in their aims, throughput and sensitivity. We discuss recent progress and challenges in the analysis of plasma/serum and proximal fluids using these strategies and highlight the potential of liquid chromatography-MS-based proteomics of cancer cell and tumor secretomes for the discovery of candidate blood-based biomarkers. Strategies for candidate validation are also described.
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called “first-person” knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer’s first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called "first-person" knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer's first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
COEUS: “semantic web in a box” for biomedical applications
2012-01-01
Background As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. Results COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. Conclusions The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/. PMID:23244467
COEUS: "semantic web in a box" for biomedical applications.
Lopes, Pedro; Oliveira, José Luís
2012-12-17
As the "omics" revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter's complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a "semantic web in a box" approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
Illuminate Knowledge Elements in Geoscience Literature
NASA Astrophysics Data System (ADS)
Ma, X.; Zheng, J. G.; Wang, H.; Fox, P. A.
2015-12-01
There are numerous dark data hidden in geoscience literature. Efficient retrieval and reuse of those data will greatly benefit geoscience researches of nowadays. Among the works of data rescue, a topic of interest is illuminating the knowledge framework, i.e. entities and relationships, embedded in documents. Entity recognition and linking have received extensive attention in news and social media analysis, as well as in bioinformatics. In the domain of geoscience, however, such works are limited. We will present our work on how to use knowledge bases on the Web, such as ontologies and vocabularies, to facilitate entity recognition and linking in geoscience literature. The work deploys an un-supervised collective inference approach [1] to link entity mentions in unstructured texts to a knowledge base, which leverages the meaningful information and structures in ontologies and vocabularies for similarity computation and entity ranking. Our work is still in the initial stage towards the detection of knowledge frameworks in literature, and we have been collecting geoscience ontologies and vocabularies in order to build a comprehensive geoscience knowledge base [2]. We hope the work will initiate new ideas and collaborations on dark data rescue, as well as on the synthesis of data and knowledge from geoscience literature. References: 1. Zheng, J., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D.L., Hendler, J., and Ji, H. 2014. Entity linking for biomedical literature. In Proceedings of ACM 8th International Workshop on Data and Text Mining in Bioinformatics, Shanghai, China. 2. Ma, X. Zheng, J., 2015. Linking geoscience entity mentions to the Web of Data. ESIP 2015 Summer Meeting, Pacific Grove, CA.
Knowledge Discovery and Data Mining: An Overview
NASA Technical Reports Server (NTRS)
Fayyad, U.
1995-01-01
The process of knowledge discovery and data mining is the process of information extraction from very large databases. Its importance is described along with several techniques and considerations for selecting the most appropriate technique for extracting information from a particular data set.
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 12 Banks and Banking 4 2014-01-01 2014-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 12 Banks and Banking 4 2012-01-01 2012-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...
A Scientific Software Product Line for the Bioinformatics domain.
Costa, Gabriella Castro B; Braga, Regina; David, José Maria N; Campos, Fernanda
2015-08-01
Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation presents case studies in bioinformatics, which were conducted in two renowned research institutions in Brazil. Copyright © 2015 Elsevier Inc. All rights reserved.
A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE).
Stacey, R Greg; Skinnider, Michael A; Scott, Nichollas E; Foster, Leonard J
2017-10-23
An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to assess protein interactions. However, given the vast number of protein complexes, more scalable methods are necessary to accelerate interaction discovery and to construct whole interactomes. We recently developed a complementary technique based on the use of protein correlation profiling (PCP) and stable isotope labeling in amino acids in cell culture (SILAC) to assess chromatographic co-elution as evidence of interacting proteins. Importantly, PCP-SILAC is also capable of measuring protein interactions simultaneously under multiple biological conditions, allowing the detection of treatment-specific changes to an interactome. Given the uniqueness and high dimensionality of co-elution data, new tools are needed to compare protein elution profiles, control false discovery rates, and construct an accurate interactome. Here we describe a freely available bioinformatics pipeline, PrInCE, for the analysis of co-elution data. PrInCE is a modular, open-source library that is computationally inexpensive, able to use label and label-free data, and capable of detecting tens of thousands of protein-protein interactions. Using a machine learning approach, PrInCE offers greatly reduced run time, more predicted interactions at the same stringency, prediction of protein complexes, and greater ease of use over previous bioinformatics tools for co-elution data. PrInCE is implemented in Matlab (version R2017a). Source code and standalone executable programs for Windows and Mac OSX are available at https://github.com/fosterlab/PrInCE , where usage instructions can be found. An example dataset and output are also provided for testing purposes. PrInCE is the first fast and easy-to-use data analysis pipeline that predicts interactomes and protein complexes from co-elution data. PrInCE allows researchers without bioinformatics expertise to analyze high-throughput co-elution datasets.
Functional Bowel Disorders: A Roadmap to Guide the Next Generation of Research.
Chang, Lin; Di Lorenzo, Carlo; Farrugia, Gianrico; Hamilton, Frank A; Mawe, Gary M; Pasricha, Pankaj J; Wiley, John W
2018-02-01
In June 2016, the National Institutes of Health hosted a workshop on functional bowel disorders (FBDs), particularly irritable bowel syndrome, with the objective of elucidating gaps in current knowledge and recommending strategies to address these gaps. The workshop aimed to provide a roadmap to help strategically guide research efforts during the next decade. Attendees were a diverse group of internationally recognized leaders in basic and clinical FBD research. This document summarizes the results of their deliberations, including the following general conclusions and recommendations. First, the high prevalence, economic burden, and impact on quality of life associated with FBDs necessitate an urgent need for improved understanding of FBDs. Second, preclinical discoveries are at a point that they can be realistically translated into novel diagnostic tests and treatments. Third, FBDs are broadly accepted as bidirectional disorders of the brain-gut axis, differentially affecting individuals throughout life. Research must integrate each component of the brain-gut axis and the influence of biological sex, early-life stressors, and genetic and epigenetic factors in individual patients. Fourth, research priorities to improve diagnostic and management paradigms include enhancement of the provider-patient relationship, longitudinal studies to identify risk and protective factors of FBDs, identification of biomarkers and endophenotypes in symptom severity and treatment response, and incorporation of emerging "-omics" discoveries. These paradigms can be applied by well-trained clinicians who are familiar with multimodal treatments. Fifth, essential components of a successful program will include the generation of a large, validated, broadly accessible database that is rigorously phenotyped; a parallel, linkable biorepository; dedicated resources to support peer-reviewed, hypothesis-driven research; access to dedicated bioinformatics expertise; and oversight by funding agencies to review priorities, progress, and potential synergies with relevant stakeholders. Copyright © 2018 AGA Institute. Published by Elsevier Inc. All rights reserved.
Freischmidt, Axel; Müller, Kathrin; Zondler, Lisa; Weydt, Patrick; Volk, Alexander E; Božič, Anže Lošdorfer; Walter, Michael; Bonin, Michael; Mayer, Benjamin; von Arnim, Christine A F; Otto, Markus; Dieterich, Christoph; Holzmann, Karlheinz; Andersen, Peter M; Ludolph, Albert C; Danzer, Karin M; Weishaupt, Jochen H
2014-11-01
Knowledge about the nature of pathomolecular alterations preceding onset of symptoms in amyotrophic lateral sclerosis is largely lacking. It could not only pave the way for the discovery of valuable therapeutic targets but might also govern future concepts of pre-manifest disease modifying treatments. MicroRNAs are central regulators of transcriptome plasticity and participate in pathogenic cascades and/or mirror cellular adaptation to insults. We obtained comprehensive expression profiles of microRNAs in the serum of patients with familial amyotrophic lateral sclerosis, asymptomatic mutation carriers and healthy control subjects. We observed a strikingly homogenous microRNA profile in patients with familial amyotrophic lateral sclerosis that was largely independent from the underlying disease gene. Moreover, we identified 24 significantly downregulated microRNAs in pre-manifest amyotrophic lateral sclerosis mutation carriers up to two decades or more before the estimated time window of disease onset; 91.7% of the downregulated microRNAs in mutation carriers overlapped with the patients with familial amyotrophic lateral sclerosis. Bioinformatic analysis revealed a consensus sequence motif present in the vast majority of downregulated microRNAs identified in this study. Our data thus suggest specific common denominators regarding molecular pathogenesis of different amyotrophic lateral sclerosis genes. We describe the earliest pathomolecular alterations in amyotrophic lateral sclerosis mutation carriers known to date, which provide a basis for the discovery of novel therapeutic targets and strongly argue for studies evaluating presymptomatic disease-modifying treatment in amyotrophic lateral sclerosis. © The Author (2014). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
de la Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; de la Iglesia, Diana; Maojo, Victor
2009-10-07
The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at http://edelman.dia.fi.upm.es/biri/.
Robust enzyme design: bioinformatic tools for improved protein stability.
Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas
2015-03-01
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cheng, Lijun; Schneider, Bryan P
2016-01-01
Background Cancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. Objective The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. Methods In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. Results By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. Conclusions The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics. PMID:27107440
ERIC Educational Resources Information Center
Benoit, Gerald
2002-01-01
Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…
Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa
2006-07-13
Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants.
Yocgo, Rosita E; Geza, Ephifania; Chimusa, Emile R; Mazandu, Gaston K
2017-11-23
Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The post-GSB protocol uses Gene Ontology semantic similarity and pathway dataset to generate enriched process regulatory network based on the functional degeneracy of the plant proteome to help understand the induced plant defence response. We applied this protocol to investigate the effect of the NPR1 gene silencing to changes in Arabidopsis thaliana plants following Pseudomonas syringae pathovar tomato strain DC3000 infection. Results indicated that the presence of a functionally active NPR1 reduced the plant's susceptibility to the infection, with about 99% of variability in Pseudomonas spore growth between npr1 mutant and wild-type samples. Moreover, the post-GSB protocol has revealed the coordinate action of target-associated genes and pathways through an enriched process regulatory network, summarizing the potential target-based induced disease resistance mechanism. This protocol can improve the characterization of the gene target and, potentially, elucidate induced defence response by more effectively utilizing available phenotype information and plant proteome functional knowledge.
Discovery of antimicrobial lipodepsipeptides produced by a Serratia sp. within mosquito microbiomes.
Ganley, Jack; Carr, Gavin; Ioerger, Thomas; Sacchettini, James; Clardy, Jon; Derbyshire, Emily
2018-04-26
The Anopheles mosquito that harbors the Plasmodium parasite contains a microbiota that can influence both the vector and parasite. In recent years, insect-associated microbes have highlighted the untapped potential of exploiting interspecies interactions to discover bioactive compounds. In this study, we report the discovery of nonribosomal lipodepsipeptides that are produced by a Serratia sp. within the midgut and salivary glands of A. stephensi mosquitoes. The lipodepsipeptides, stephensiolides A-K, have antibiotic activity and facilitate bacterial surface motility. Bioinformatic analyses indicate that the stephensiolides are ubiquitous in nature and are likely important for Serratia spp. colonization within mosquitoes, humans, and other ecological niches. Our results demonstrate the usefulness of probing insect-microbiome interactions, enhance our understanding of the chemical ecology within Anopheles mosquitoes, and provide a secondary metabolite scaffold to further investigate this complex relationship. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and treesmore » determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less
Nikolouli, Katerina; Mossialos, Dimitris
2012-08-01
Non-ribosomal peptide synthetases (NRPS) and type-I polyketide synthases (PKS-I) are multimodular enzymes involved in biosynthesis of oligopeptide and polyketide secondary metabolites produced by microorganisms such as bacteria and fungi. New findings regarding the mechanisms underlying NRPS and PKS-I evolution illustrate how microorganisms expand their metabolic potential. During the last decade rapid development of bioinformatics tools as well as improved sequencing and annotation of microbial genomes led to discovery of novel bioactive compounds synthesized by NRPS and PKS-I through genome-mining. Taking advantage of these technological developments metagenomics is a fast growing research field which directly studies microbial genomes or specific gene groups and their products. Discovery of novel bioactive compounds synthesized by NRPS and PKS-I will certainly be accelerated through metagenomics, allowing the exploitation of so far untapped microbial resources in biotechnology and medicine.
BioCIDER: a Contextualisation InDEx for biological Resources discovery
Horro, Carlos; Cook, Martin; Attwood, Teresa K.; Brazas, Michelle D.; Hancock, John M.; Palagi, Patricia; Corpas, Manuel; Jimenez, Rafael
2017-01-01
Abstract Summary The vast, uncoordinated proliferation of bioinformatics resources (databases, software tools, training materials etc.) makes it difficult for users to find them. To facilitate their discovery, various services are being developed to collect such resources into registries. We have developed BioCIDER, which, rather like online shopping ‘recommendations’, provides a contextualization index to help identify biological resources relevant to the content of the sites in which it is embedded. Availability and Implementation BioCIDER (www.biocider.org) is an open-source platform. Documentation is available online (https://goo.gl/Klc51G), and source code is freely available via GitHub (https://github.com/BioCIDER). The BioJS widget that enables websites to embed contextualization is available from the BioJS registry (http://biojs.io/). All code is released under an MIT licence. Contact carlos.horro@earlham.ac.uk or rafael.jimenez@elixir-europe.org or manuel@repositive.io PMID:28407033
Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María
2009-01-01
In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.
How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?
The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are ...
Kwon, Yeondae; Natori, Yukikazu
2017-01-01
The proportion of the elderly population in most countries worldwide is increasing dramatically. Therefore, social interest in the fields of health, longevity, and anti-aging has been increasing as well. However, the basic research results obtained from a reductionist approach in biology and a bioinformatic approach in genome science have limited usefulness for generating insights on future health, longevity, and anti-aging-related research on a case by case basis. We propose a new approach that uses our literature mining technique and bioinformatics, which lead to a better perspective on research trends by providing an expanded knowledge base to work from. We demonstrate that our approach provides useful information that deepens insights on future trends which differs from data obtained conventionally, and this methodology is already paving the way for a new field in aging-related research based on literature mining. One compelling example of this is how our new approach can be a useful tool in drug repositioning. PMID:28817730
Trujillano, Daniel; Bullich, Gemma; Ossowski, Stephan; Ballarín, José; Torra, Roser; Estivill, Xavier; Ars, Elisabet
2014-09-01
Molecular diagnostics of autosomal dominant polycystic kidney disease (ADPKD) relies on mutation screening of PKD1 and PKD2, which is complicated by extensive allelic heterogeneity and the presence of six highly homologous sequences of PKD1. To date, specific sequencing of PKD1 requires laborious long-range amplifications. The high cost and long turnaround time of PKD1 and PKD2 mutation analysis using conventional techniques limits its widespread application in clinical settings. We performed targeted next-generation sequencing (NGS) of PKD1 and PKD2. Pooled barcoded DNA patient libraries were enriched by in-solution hybridization with PKD1 and PKD2 capture probes. Bioinformatics analysis was performed using an in-house developed pipeline. We validated the assay in a cohort of 36 patients with previously known PKD1 and PKD2 mutations and five control individuals. Then, we used the same assay and bioinformatics analysis in a discovery cohort of 12 uncharacterized patients. We detected 35 out of 36 known definitely, highly likely, and likely pathogenic mutations in the validation cohort, including two large deletions. In the discovery cohort, we detected 11 different pathogenic mutations in 10 out of 12 patients. This study demonstrates that laborious long-range PCRs of the repeated PKD1 region can be avoided by in-solution enrichment of PKD1 and PKD2 and NGS. This strategy significantly reduces the cost and time for simultaneous PKD1 and PKD2 sequence analysis, facilitating routine genetic diagnostics of ADPKD.
Palmier, Mark O.; Fulcher, Yan G.; Bhaskaran, Rajagopalan; Duong, Vinh Q.; Fields, Gregg B.; Van Doren, Steven R.
2010-01-01
The catalytic domain of metalloelastase (matrix metalloproteinase-12 or MMP-12) is unique among MMPs in exerting high proteolytic activity upon fibrils that resist hydrolysis, especially elastin from lungs afflicted with chronic obstructive pulmonary disease or arteries with aneurysms. How does the MMP-12 catalytic domain achieve this specificity? NMR interface mapping suggests that α-elastin species cover the primed subsites, a strip across the β-sheet from β-strand IV to the II–III loop, and a broad bowl from helix A to helix C. The many contacts may account for the comparatively high affinity, as well as embedding of MMP-12 in damaged elastin fibrils in vivo. We developed a strategy called BINDSIght, for bioinformatics and NMR discovery of specificity of interactions, to evaluate MMP-12 specificity without a structure of a complex. BINDSIght integration of the interface mapping with other ambiguous information from sequences guided choice mutations in binding regions nearer the active site. Single substitutions at each of ten locations impair specific activity toward solubilized elastin. Five of them impair release of peptides from intact elastin fibrils. Eight lesions also impair specific activity toward triple helices from collagen IV or V. Eight sites map to the “primed” side in the III–IV, V–B, and S1′ specificity loops. Two map to the “unprimed” side in the IV–V and B–C loops. The ten key residues circumscribe the catalytic cleft, form an exosite, and are distinctive features available for targeting by new diagnostics or therapeutics. PMID:20663866
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Resource Discovery within the Networked "Hybrid" Library.
ERIC Educational Resources Information Center
Leigh, Sally-Anne
This paper focuses on the development, adoption, and integration of resource discovery, knowledge management, and/or knowledge sharing interfaces such as interactive portals, and the use of the library's World Wide Web presence to increase the availability and usability of information services. The introduction addresses changes in library…
A biological compression model and its applications.
Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd
2011-01-01
A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.
2013-01-01
Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework. PMID:23763826
Holzinger, Andreas; Zupan, Mario
2013-06-13
Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.
Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D
2008-01-01
Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345
Form-Focused Discovery Activities in English Classes
ERIC Educational Resources Information Center
Ogeyik, Muhlise Cosgun
2011-01-01
Form-focused discovery activities allow language learners to grasp various aspects of a target language by contributing implicit knowledge by using discovered explicit knowledge. Moreover, such activities can assist learners to perceive and discover the features of their language input. In foreign language teaching environments, they can be used…
Natural product discovery: past, present, and future.
Katz, Leonard; Baltz, Richard H
2016-03-01
Microorganisms have provided abundant sources of natural products which have been developed as commercial products for human medicine, animal health, and plant crop protection. In the early years of natural product discovery from microorganisms (The Golden Age), new antibiotics were found with relative ease from low-throughput fermentation and whole cell screening methods. Later, molecular genetic and medicinal chemistry approaches were applied to modify and improve the activities of important chemical scaffolds, and more sophisticated screening methods were directed at target disease states. In the 1990s, the pharmaceutical industry moved to high-throughput screening of synthetic chemical libraries against many potential therapeutic targets, including new targets identified from the human genome sequencing project, largely to the exclusion of natural products, and discovery rates dropped dramatically. Nonetheless, natural products continued to provide key scaffolds for drug development. In the current millennium, it was discovered from genome sequencing that microbes with large genomes have the capacity to produce about ten times as many secondary metabolites as was previously recognized. Indeed, the most gifted actinomycetes have the capacity to produce around 30-50 secondary metabolites. With the precipitous drop in cost for genome sequencing, it is now feasible to sequence thousands of actinomycete genomes to identify the "biosynthetic dark matter" as sources for the discovery of new and novel secondary metabolites. Advances in bioinformatics, mass spectrometry, proteomics, transcriptomics, metabolomics and gene expression are driving the new field of microbial genome mining for applications in natural product discovery and development.
Taking Stock of the Drosophila Research Ecosystem
Bilder, David; Irvine, Kenneth D.
2017-01-01
With a century-old history of fundamental discoveries, the fruit fly has long been a favored experimental organism for a wide range of scientific inquiries. But Drosophila is not a “legacy” model organism; technical and intellectual innovations continue to revitalize fly research and drive advances in our understanding of conserved mechanisms of animal biology. Here, we provide an overview of this “ecosystem” and discuss how to address emerging challenges to ensure its continued productivity. Drosophila researchers are fortunate to have a sophisticated and ever-growing toolkit for the analysis of gene function. Access to these tools depends upon continued support for both physical and informational resources. Uncertainty regarding stable support for bioinformatic databases is a particular concern, at a time when there is the need to make the vast knowledge of functional biology provided by this model animal accessible to scientists studying other organisms. Communication and advocacy efforts will promote appreciation of the value of the fly in delivering biomedically important insights. Well-tended traditions of large-scale tool development, open sharing of reagents, and community engagement provide a strong basis for coordinated and proactive initiatives to improve the fly research ecosystem. Overall, there has never been a better time to be a fly pusher. PMID:28684603
Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi
2013-11-20
With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
BioCatalogue: a universal catalogue of web services for the life sciences
Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A.
2010-01-01
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community. PMID:20484378
BioCatalogue: a universal catalogue of web services for the life sciences.
Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A
2010-07-01
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.
Antimicrobial resistance surveillance in the genomic age.
McArthur, Andrew G; Tsang, Kara K
2017-01-01
The loss of effective antimicrobials is reducing our ability to protect the global population from infectious disease. However, the field of antibiotic drug discovery and the public health monitoring of antimicrobial resistance (AMR) is beginning to exploit the power of genome and metagenome sequencing. The creation of novel AMR bioinformatics tools and databases and their continued development will advance our understanding of the molecular mechanisms and threat severity of antibiotic resistance, while simultaneously improving our ability to accurately predict and screen for antibiotic resistance genes within environmental, agricultural, and clinical settings. To do so, efforts must be focused toward exploiting the advancements of genome sequencing and information technology. Currently, AMR bioinformatics software and databases reflect different scopes and functions, each with its own strengths and weaknesses. A review of the available tools reveals common approaches and reference data but also reveals gaps in our curated data, models, algorithms, and data-sharing tools that must be addressed to conquer the limitations and areas of unmet need within the AMR research field before DNA sequencing can be fully exploited for AMR surveillance and improved clinical outcomes. © 2016 New York Academy of Sciences.
Bioinformatics analysis of transcriptome dynamics during growth in angus cattle longissimus muscle.
Moisá, Sonia J; Shike, Daniel W; Graugnard, Daniel E; Rodriguez-Zas, Sandra L; Everts, Robin E; Lewin, Harris A; Faulkner, Dan B; Berger, Larry L; Loor, Juan J
2013-01-01
Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.
Stevens, David Cole; Conway, Kyle R.; Pearce, Nelson; Villegas-Peñaranda, Luis Roberto; Garza, Anthony G.; Boddy, Christopher N.
2013-01-01
Background Heterologous expression of bacterial biosynthetic gene clusters is currently an indispensable tool for characterizing biosynthetic pathways. Development of an effective, general heterologous expression system that can be applied to bioprospecting from metagenomic DNA will enable the discovery of a wealth of new natural products. Methodology We have developed a new Escherichia coli-based heterologous expression system for polyketide biosynthetic gene clusters. We have demonstrated the over-expression of the alternative sigma factor σ54 directly and positively regulates heterologous expression of the oxytetracycline biosynthetic gene cluster in E. coli. Bioinformatics analysis indicates that σ54 promoters are present in nearly 70% of polyketide and non-ribosomal peptide biosynthetic pathways. Conclusions We have demonstrated a new mechanism for heterologous expression of the oxytetracycline polyketide biosynthetic pathway, where high-level pleiotropic sigma factors from the heterologous host directly and positively regulate transcription of the non-native biosynthetic gene cluster. Our bioinformatics analysis is consistent with the hypothesis that heterologous expression mediated by the alternative sigma factor σ54 may be a viable method for the production of additional polyketide products. PMID:23724102
Effects-based monitoring (EBM) has been employed as a complement to chemical monitoring to help address knowledge gaps between chemical occurrence and biological effects. We have piloted several pathway-based approaches to EBM, that utilize modern bioinformatic and high throughpu...
75 FR 66766 - NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and Development
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-29
..., identifies gaps in knowledge and capabilities, and defines NIAID's goals for the continued discovery... DEPARTMENT OF HEALTH AND HUMAN SERVICES NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and... agenda for the discovery, development and clinical evaluation of adjuvants for use with preventive...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 12 Banks and Banking 3 2011-01-01 2011-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...
12 CFR 19.170 - Discovery depositions.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 1 2010-01-01 2010-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...
12 CFR 19.170 - Discovery depositions.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 12 Banks and Banking 1 2011-01-01 2011-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 3 2010-01-01 2010-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz
2009-08-25
Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.
Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz
2009-01-01
Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156
Expanding Role of Data Science and Bioinformatics in Drug Discovery and Development.
Fingert, Howard J
2018-01-01
Numerous barriers have been identified which detract from successful applications of clinical trial data and platforms. Despite the challenges, opportunities are growing to advance compliance, quality, and practical applications through top-down establishment of guiding principles, coupled with bottom-up approaches to promote data science competencies among data producers. Recent examples of successful applications include modern treatments for hematologic malignancies, developed with support from public-private partnerships, guiding principles for data-sharing, standards for protocol designs and data management, digital technologies, and quality analytics. © 2017 American Society for Clinical Pharmacology and Therapeutics.
Watson-Haigh, Nathan S; Shang, Catherine A; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V
2013-09-01
The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.
Watson-Haigh, Nathan S.; Shang, Catherine A.; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V.
2013-01-01
The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show–style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources. PMID:23543352
Medical knowledge discovery and management.
Prior, Fred
2009-05-01
Although the volume of medical information is growing rapidly, the ability to rapidly convert this data into "actionable insights" and new medical knowledge is lagging far behind. The first step in the knowledge discovery process is data management and integration, which logically can be accomplished through the application of data warehouse technologies. A key insight that arises from efforts in biosurveillance and the global scope of military medicine is that information must be integrated over both time (longitudinal health records) and space (spatial localization of health-related events). Once data are compiled and integrated it is essential to encode the semantics and relationships among data elements through the use of ontologies and semantic web technologies to convert data into knowledge. Medical images form a special class of health-related information. Traditionally knowledge has been extracted from images by human observation and encoded via controlled terminologies. This approach is rapidly being replaced by quantitative analyses that more reliably support knowledge extraction. The goals of knowledge discovery are the improvement of both the timeliness and accuracy of medical decision making and the identification of new procedures and therapies.
Newton, Mandi S; Scott-Findlay, Shannon
2007-01-01
Background In the past 15 years, knowledge translation in healthcare has emerged as a multifaceted and complex agenda. Theoretical and polemical discussions, the development of a science to study and measure the effects of translating research evidence into healthcare, and the role of key stakeholders including academe, healthcare decision-makers, the public, and government funding bodies have brought scholarly, organizational, social, and political dimensions to the agenda. Objective This paper discusses the current knowledge translation agenda in Canadian healthcare and how elements in this agenda shape the discovery and translation of health knowledge. Discussion The current knowledge translation agenda in Canadian healthcare involves the influence of values, priorities, and people; stakes which greatly shape the discovery of research knowledge and how it is or is not instituted in healthcare delivery. As this agenda continues to take shape and direction, ensuring that it is accountable for its influences is essential and should be at the forefront of concern to the Canadian public and healthcare community. This transparency will allow for scrutiny, debate, and improvements in health knowledge discovery and health services delivery. PMID:17916256
Concept of operations for knowledge discovery from Big Data across enterprise data warehouses
NASA Astrophysics Data System (ADS)
Sukumar, Sreenivas R.; Olama, Mohammed M.; McNair, Allen W.; Nutaro, James J.
2013-05-01
The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Options that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.
The center for causal discovery of biomedical knowledge from big data.
Cooper, Gregory F; Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-11-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
McGovern, Mary Francis
Non-formal environmental education provides students the opportunity to learn in ways that would not be possible in a traditional classroom setting. Outdoor learning allows students to make connections to their environment and helps to foster an appreciation for nature. This type of education can be interdisciplinary---students not only develop skills in science, but also in mathematics, social studies, technology, and critical thinking. This case study focuses on a non-formal marine education program, the South Carolina Department of Natural Resources' (SCDNR) Discovery vessel based program. The Discovery curriculum was evaluated to determine impact on student knowledge about and attitude toward the estuary. Students from two South Carolina coastal counties who attended the boat program during fall 2014 were asked to complete a brief survey before, immediately after, and two weeks following the program. The results of this study indicate that both student knowledge about and attitude significantly improved after completion of the Discovery vessel based program. Knowledge and attitude scores demonstrated a positive correlation.
Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying
2011-08-01
The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.
Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz
2016-01-01
The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.
Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis
2016-01-01
The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
NETTAB 2012 on "Integrated Bio-Search"
2014-01-01
The NETTAB 2012 workshop, held in Como on November 14-16, 2012, was devoted to "Integrated Bio-Search", that is to technologies, methods, architectures, systems and applications for searching, retrieving, integrating and analyzing data, information, and knowledge with the aim of answering complex bio-medical-molecular questions, i.e. some of the most challenging issues in bioinformatics today. It brought together about 80 researchers working in the field of Bioinformatics, Computational Biology, Biology, Computer Science and Engineering. More than 50 scientific contributions, including keynote and tutorial talks, oral communications, posters and software demonstrations, were presented at the workshop. This preface provides a brief overview of the workshop and shortly introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. PMID:24564635
Biologically inspired intelligent decision making
Manning, Timmy; Sleator, Roy D; Walsh, Paul
2014-01-01
Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems. PMID:24335433
Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom
Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.
2017-01-01
With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called “The Genetics of Race.” This module uses the topic of race to engage students with sequencing and genetics. In the experimental portion of this module, students isolate their own mitochondrial DNA using standard biotechnology techniques and collect next-generation sequencing data to determine which of their classmates are most and least genetically similar to themselves. We evaluated the efficacy of this module by administering a pretest/posttest evaluation to measure student knowledge related to sequencing and bioinformatics, and we also conducted a survey at the conclusion of the module to assess student attitudes. Upon completion of our Genetics of Race module, students demonstrated significant learning gains, with lower-performing students obtaining the highest gains, and developed more positive attitudes toward scientific research. PMID:28408407
G-DOC Plus - an integrative bioinformatics platform for precision medicine.
Bhuvaneshwar, Krithika; Belouali, Anas; Singh, Varun; Johnson, Robert M; Song, Lei; Alaoui, Adil; Harris, Michael A; Clarke, Robert; Weiner, Louis M; Gusev, Yuriy; Madhavan, Subha
2016-04-30
G-DOC Plus is a data integration and bioinformatics platform that uses cloud computing and other advanced computational tools to handle a variety of biomedical BIG DATA including gene expression arrays, NGS and medical images so that they can be analyzed in the full context of other omics and clinical information. G-DOC Plus currently holds data from over 10,000 patients selected from private and public resources including Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and the recently added datasets from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT), caArray studies of lung and colon cancer, ImmPort and the 1000 genomes data sets. The system allows researchers to explore clinical-omic data one sample at a time, as a cohort of samples; or at the level of population, providing the user with a comprehensive view of the data. G-DOC Plus tools have been leveraged in cancer and non-cancer studies for hypothesis generation and validation; biomarker discovery and multi-omics analysis, to explore somatic mutations and cancer MRI images; as well as for training and graduate education in bioinformatics, data and computational sciences. Several of these use cases are described in this paper to demonstrate its multifaceted usability. G-DOC Plus can be used to support a variety of user groups in multiple domains to enable hypothesis generation for precision medicine research. The long-term vision of G-DOC Plus is to extend this translational bioinformatics platform to stay current with emerging omics technologies and analysis methods to continue supporting novel hypothesis generation, analysis and validation for integrative biomedical research. By integrating several aspects of the disease and exposing various data elements, such as outpatient lab workup, pathology, radiology, current treatments, molecular signatures and expected outcomes over a web interface, G-DOC Plus will continue to strengthen precision medicine research. G-DOC Plus is available at: https://gdoc.georgetown.edu .
Walther, Stefanie; Tietze, Manfred; Czerny, Claus-Peter; König, Sven; Diesterbeck, Ulrike S
2016-01-01
We have developed a new bioinformatics framework for the analysis of rearranged bovine heavy chain immunoglobulin (Ig) variable regions by combining and refining widely used alignment algorithms. This bioinformatics framework allowed us to investigate alignments of heavy chain framework regions (FRHs) and the separate alignments of FRHs and heavy chain complementarity determining regions (CDRHs) to determine their germline origin in the four cattle breeds Aubrac, German Black Pied, German Simmental, and Holstein Friesian. Now it is also possible to specifically analyze Ig heavy chains possessing exceptionally long CDR3Hs. In order to gain more insight into breed specific differences in Ig combinatorial diversity, somatic hypermutations and putative gene conversions of IgG, we compared the dominantly transcribed variable (IGHV), diversity (IGHD), and joining (IGHJ) segments and their recombination in the four cattle breeds. The analysis revealed the use of 15 different IGHV segments, 21 IGHD segments, and two IGHJ segments with significant different transcription levels within the breeds. Furthermore, there are preferred rearrangements within the three groups of CDR3H lengths. In the sequences of group 2 (CDR3H lengths (L) of 11-47 amino acid residues (aa)) a higher number of recombination was observed than in sequences of group 1 (L≤10 aa) and 3 (L≥48 aa). The combinatorial diversity of germline IGHV, IGHD, and IGHJ-segments revealed 162 rearrangements that were significantly different. The few preferably rearranged gene segments within group 3 CDR3H regions may indicate specialized antibodies because this length is unique in cattle. The most important finding of this study, which was enabled by using the bioinformatics framework, is the discovery of strong evidence for gene conversion as a rare event using pseudogenes fulfilling all definitions for this particular diversification mechanism.
Czerny, Claus-Peter; König, Sven; Diesterbeck, Ulrike S.
2016-01-01
We have developed a new bioinformatics framework for the analysis of rearranged bovine heavy chain immunoglobulin (Ig) variable regions by combining and refining widely used alignment algorithms. This bioinformatics framework allowed us to investigate alignments of heavy chain framework regions (FRHs) and the separate alignments of FRHs and heavy chain complementarity determining regions (CDRHs) to determine their germline origin in the four cattle breeds Aubrac, German Black Pied, German Simmental, and Holstein Friesian. Now it is also possible to specifically analyze Ig heavy chains possessing exceptionally long CDR3Hs. In order to gain more insight into breed specific differences in Ig combinatorial diversity, somatic hypermutations and putative gene conversions of IgG, we compared the dominantly transcribed variable (IGHV), diversity (IGHD), and joining (IGHJ) segments and their recombination in the four cattle breeds. The analysis revealed the use of 15 different IGHV segments, 21 IGHD segments, and two IGHJ segments with significant different transcription levels within the breeds. Furthermore, there are preferred rearrangements within the three groups of CDR3H lengths. In the sequences of group 2 (CDR3H lengths (L) of 11–47 amino acid residues (aa)) a higher number of recombination was observed than in sequences of group 1 (L≤10 aa) and 3 (L≥48 aa). The combinatorial diversity of germline IGHV, IGHD, and IGHJ-segments revealed 162 rearrangements that were significantly different. The few preferably rearranged gene segments within group 3 CDR3H regions may indicate specialized antibodies because this length is unique in cattle. The most important finding of this study, which was enabled by using the bioinformatics framework, is the discovery of strong evidence for gene conversion as a rare event using pseudogenes fulfilling all definitions for this particular diversification mechanism. PMID:27828971
Bringing Web 2.0 to bioinformatics.
Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P
2009-01-01
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Maréchal, Eric
2008-09-01
Chemogenomics is the study of the interaction of functional biological systems with exogenous small molecules, or in broader sense the study of the intersection of biological and chemical spaces. Chemogenomics requires expertises in biology, chemistry and computational sciences (bioinformatics, cheminformatics, large scale statistics and machine learning methods) but it is more than the simple apposition of each of these disciplines. Biological entities interacting with small molecules can be isolated proteins or more elaborate systems, from single cells to complete organisms. The biological space is therefore analyzed at various postgenomic levels (genomic, transcriptomic, proteomic or any phenotypic level). The space of small molecules is partially real, corresponding to commercial and academic collections of compounds, and partially virtual, corresponding to the chemical space possibly synthesizable. Synthetic chemistry has developed novel strategies allowing a physical exploration of this universe of possibilities. A major challenge of cheminformatics is to charter the virtual space of small molecules using realistic biological constraints (bioavailability, druggability, structural biological information). Chemogenomics is a descendent of conventional pharmaceutical approaches, since it involves the screening of chemolibraries for their effect on biological targets, and benefits from the advances in the corresponding enabling technologies and the introduction of new biological markers. Screening was originally motivated by the rigorous discovery of new drugs, neglecting and throwing away any molecule that would fail to meet the standards required for a therapeutic treatment. It is now the basis for the discovery of small molecules that might or might not be directly used as drugs, but which have an immense potential for basic research, as probes to explore an increasing number of biological phenomena. Concerns about the environmental impact of chemical industry open new fields of research for chemogenomics.
Voros, Szilard; Maurovich-Horvat, Pal; Marvasty, Idean B; Bansal, Aruna T; Barnes, Michael R; Vazquez, Gustavo; Murray, Sarah S; Voros, Viktor; Merkely, Bela; Brown, Bradley O; Warnick, G Russell
2014-01-01
Complex biological networks of atherosclerosis are largely unknown. The main objective of the Genetic Loci and the Burden of Atherosclerotic Lesions study is to assemble comprehensive biological networks of atherosclerosis using advanced cardiovascular imaging for phenotyping, a panomic approach to identify underlying genomic, proteomic, metabolomic, and lipidomic underpinnings, analyzed by systems biology-driven bioinformatics. By design, this is a hypothesis-free unbiased discovery study collecting a large number of biologically related factors to examine biological associations between genomic, proteomic, metabolomic, lipidomic, and phenotypic factors of atherosclerosis. The Genetic Loci and the Burden of Atherosclerotic Lesions study (NCT01738828) is a prospective, multicenter, international observational study of atherosclerotic coronary artery disease. Approximately 7500 patients are enrolled and undergo non-contrast-enhanced coronary calcium scanning by CT for the detection and quantification of coronary artery calcium, as well as coronary artery CT angiography for the detection and quantification of plaque, stenosis, and overall coronary artery disease burden. In addition, patients undergo whole genome sequencing, DNA methylation, whole blood-based transcriptome sequencing, unbiased proteomics based on mass spectrometry, as well as metabolomics and lipidomics on a mass spectrometry platform. The study is analyzed in 3 subsequent phases, and each phase consists of a discovery cohort and an independent validation cohort. For the primary analysis, the primary phenotype will be the presence of any atherosclerotic plaque, as detected by cardiac CT. Additional phenotypic analyses will include per patient maximal luminal stenosis defined as 50% and 70% diameter stenosis. Single-omic and multi-omic associations will be examined for each phenotype; putative biomarkers will be assessed for association, calibration, discrimination, and reclassification. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Tiny giants of gene regulation: experimental strategies for microRNA functional studies
Steinkraus, Bruno R.; Toegel, Markus
2016-01-01
The discovery over two decades ago of short regulatory microRNAs (miRNAs) has led to the inception of a vast biomedical research field dedicated to understanding these powerful orchestrators of gene expression. Here we aim to provide a comprehensive overview of the methods and techniques underpinning the experimental pipeline employed for exploratory miRNA studies in animals. Some of the greatest challenges in this field have been uncovering the identity of miRNA–target interactions and deciphering their significance with regard to particular physiological or pathological processes. These endeavors relied almost exclusively on the development of powerful research tools encompassing novel bioinformatics pipelines, high‐throughput target identification platforms, and functional target validation methodologies. Thus, in an unparalleled manner, the biomedical technology revolution unceasingly enhanced and refined our ability to dissect miRNA regulatory networks and understand their roles in vivo in the context of cells and organisms. Recurring motifs of target recognition have led to the creation of a large number of multifactorial bioinformatics analysis platforms, which have proved instrumental in guiding experimental miRNA studies. Subsequently, the need for discovery of miRNA–target binding events in vivo drove the emergence of a slew of high‐throughput multiplex strategies, which now provide a viable prospect for elucidating genome‐wide miRNA–target binding maps in a variety of cell types and tissues. Finally, deciphering the functional relevance of miRNA post‐transcriptional gene silencing under physiological conditions, prompted the evolution of a host of technologies enabling systemic manipulation of miRNA homeostasis as well as high‐precision interference with their direct, endogenous targets. WIREs Dev Biol 2016, 5:311–362. doi: 10.1002/wdev.223 For further resources related to this article, please visit the WIREs website. PMID:26950183
Chen, Jinyun; Wu, Xifeng; Huang, Yujing; Chen, Wei; Brand, Randall E.; Killary, Ann M.; Sen, Subrata; Frazier, Marsha L.
2016-01-01
Biomarkers are critically needed for the early detection of pancreatic cancer (PC) are urgently needed. Our purpose was to identify a panel of genetic variants that, combined, can predict increased risk for early-onset PC and thereby identify individuals who should begin screening at an early age. Previously, we identified genes using a functional genomic approach that were aberrantly expressed in early pathways to PC tumorigenesis. We now report the discovery of single nucleotide polymorphisms (SNPs) in these genes associated with early age at diagnosis of PC using a two-phase study design. In silico and bioinformatics tools were used to examine functional relevance of the identified SNPs. Eight SNPs were consistently associated with age at diagnosis in the discovery phase, validation phase and pooled analysis. Further analysis of the joint effects of these 8 SNPs showed that, compared to participants carrying none of these unfavorable genotypes (median age at PC diagnosis 70 years), those carrying 1–2, 3–4, or 5 or more unfavorable genotypes had median ages at diagnosis of 64, 63, and 62 years, respectively (P = 3.0E–04). A gene-dosage effect was observed, with age at diagnosis inversely related to number of unfavorable genotypes (Ptrend = 1.0E–04). Using bioinformatics tools, we found that all of the 8 SNPs were predicted to play functional roles in the disruption of transcription factor and/or enhancer binding sites and most of them were expression quantitative trait loci (eQTL) of the target genes. The panel of genetic markers identified may serve as susceptibility markers for earlier PC diagnosis. PMID:27486767
Genomes2Drugs: Identifies Target Proteins and Lead Drugs from Proteome Data
Toomey, David; Hoppe, Heinrich C.; Brennan, Marian P.; Nolan, Kevin B.; Chubb, Anthony J.
2009-01-01
Background Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. Methodology/Principal Findings To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. Conclusions/Significance Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents. PMID:19593435
ERIC Educational Resources Information Center
Tsantis, Linda; Castellani, John
2001-01-01
This article explores how knowledge-discovery applications can empower educators with the information they need to provide anticipatory guidance for teaching and learning, forecast school and district needs, and find critical markers for making the best program decisions for children and youth with disabilities. Data mining for schools is…
ERIC Educational Resources Information Center
Molina, Otilia Alejandro; Ratté, Sylvie
2017-01-01
This research introduces a method to construct a unified representation of teachers and students perspectives based on the actionable knowledge discovery (AKD) and delivery framework. The representation is constructed using two models: one obtained from student evaluations and the other obtained from teachers' reflections about their teaching…
ERIC Educational Resources Information Center
Taft, Laritza M.
2010-01-01
In its report "To Err is Human", The Institute of Medicine recommended the implementation of internal and external voluntary and mandatory automatic reporting systems to increase detection of adverse events. Knowledge Discovery in Databases (KDD) allows the detection of patterns and trends that would be hidden or less detectable if analyzed by…
Knowledge Discovery Process: Case Study of RNAV Adherence of Radar Track Data
NASA Technical Reports Server (NTRS)
Matthews, Bryan
2018-01-01
This talk is an introduction to the knowledge discovery process, beginning with: identifying the problem, choosing data sources, matching the appropriate machine learning tools, and reviewing the results. The overview will be given in the context of an ongoing study that is assessing RNAV adherence of commercial aircraft in the national airspace.
Cheng, Lijun; Schneider, Bryan P; Li, Lang
2016-07-01
Cancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Harmon, Glynn
2013-01-01
The term discovery applies herein to the successful outcome of inquiry in which a significant personal, professional or scholarly breakthrough or insight occurs, and which is individually or socially acknowledged as a key contribution to knowledge. Since discoveries culminate at fixed points in time, discoveries can serve as an outcome metric for…
Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G
2013-01-01
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
'Big Data' Collaboration: Exploring, Recording and Sharing Enterprise Knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R; Ferrell, Regina Kay
2013-01-01
As data sources and data size proliferate, knowledge discovery from "Big Data" is starting to pose several challenges. In this paper, we address a specific challenge in the practice of enterprise knowledge management while extracting actionable nuggets from diverse data sources of seemingly-related information. In particular, we address the challenge of archiving knowledge gained through collaboration, dissemination and visualization as part of the data analysis, inference and decision-making lifecycle. We motivate the implementation of an enterprise data-discovery and knowledge recorder tool, called SEEKER based on real world case-study. We demonstrate SEEKER capturing schema and data-element relationships, tracking the data elementsmore » of value based on the queries and the analytical artifacts that are being created by analysts as they use the data. We show how the tool serves as digital record of institutional domain knowledge and a documentation for the evolution of data elements, queries and schemas over time. As a knowledge management service, a tool like SEEKER saves enterprise resources and time by avoiding analytic silos, expediting the process of multi-source data integration and intelligently documenting discoveries from fellow analysts.« less
Trends in life science grid: from computing grid to knowledge grid.
Konagaya, Akihiko
2006-12-18
Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Trends in life science grid: from computing grid to knowledge grid
Konagaya, Akihiko
2006-01-01
Background Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. Results This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Conclusion Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community. PMID:17254294
Comprehensive decision tree models in bioinformatics.
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
Mak, Wai Shun; Tran, Stephen; Marcheschi, Ryan; Bertolani, Steve; Thompson, James; Baker, David; Liao, James C; Siegel, Justin B
2015-11-24
The ability to biosynthetically produce chemicals beyond what is commonly found in Nature requires the discovery of novel enzyme function. Here we utilize two approaches to discover enzymes that enable specific production of longer-chain (C5-C8) alcohols from sugar. The first approach combines bioinformatics and molecular modelling to mine sequence databases, resulting in a diverse panel of enzymes capable of catalysing the targeted reaction. The median catalytic efficiency of the computationally selected enzymes is 75-fold greater than a panel of naively selected homologues. This integrative genomic mining approach establishes a unique avenue for enzyme function discovery in the rapidly expanding sequence databases. The second approach uses computational enzyme design to reprogramme specificity. Both approaches result in enzymes with >100-fold increase in specificity for the targeted reaction. When enzymes from either approach are integrated in vivo, longer-chain alcohol production increases over 10-fold and represents >95% of the total alcohol products.
Strategies for identifying new prions in yeast
MacLea, Kyle S
2011-01-01
The unexpected discovery of two prions, [URE3] and [PSI+], in Saccharomyces cerevisiae led to questions about how many other proteins could undergo similar prion-based structural conversions. However, [URE3] and [PSI+] were discovered by serendipity in genetic screens. Cataloging the full range of prions in yeast or in other organisms will therefore require more systematic search methods. Taking advantage of some of the unique features of prions, various researchers have developed bioinformatic and experimental methods for identifying novel prion proteins. These methods have generated long lists of prion candidates. The systematic testing of some of these prion candidates has led to notable successes; however, even in yeast, where rapid growth rate and ease of genetic manipulation aid in testing for prion activity, such candidate testing is laborious. Development of better methods to winnow the field of prion candidates will greatly aid in the discovery of new prions, both in yeast and in other organisms, and help us to better understand the role of prions in biology. PMID:22052351
Integration, Networking, and Global Biobanking in the Age of New Biology.
Karimi-Busheri, Feridoun; Rasouli-Nia, Aghdass
2015-01-01
Scientific revolution is changing the world forever. Many new disciplines and fields have emerged with unlimited possibilities and opportunities. Biobanking is one of many that is benefiting from revolutionary milestones in human genome, post-genomic, and computer and bioinformatics discoveries. The storage, management, and analysis of massive clinical and biological data sets cannot be achieved without a global collaboration and networking. At the same time, biobanking is facing many significant challenges that need to be addressed and solved including dealing with an ever increasing complexity of sample storage and retrieval, data management and integration, and establishing common platforms in a global context. The overall picture of the biobanking of the future, however, is promising. Many population-based biobanks have been formed, and more are under development. It is certain that amazing discoveries will emerge from this large-scale method of preserving and accessing human samples. Signs of a healthy collaboration between industry, academy, and government are encouraging.
Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects
Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan
2013-01-01
Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693
SSRPrimer and SSR Taxonomy Tree: Biome SSR discovery.
Jewell, Erica; Robinson, Andrew; Savage, David; Erwin, Tim; Love, Christopher G; Lim, Geraldine A C; Li, Xi; Batley, Jacqueline; Spangenberg, German C; Edwards, David
2006-07-01
Simple sequence repeat (SSR) molecular genetic markers have become important tools for a broad range of applications such as genome mapping and genetic diversity studies. SSRs are readily identified within DNA sequence data and PCR primers can be designed for their amplification. These PCR primers frequently cross amplify within related species. We report a web-based tool, SSR Primer, that integrates SPUTNIK, an SSR repeat finder, with Primer3, a primer design program, within one pipeline. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. Results are then parsed to Primer3 for locus specific primer design. We have applied this tool for the discovery of SSRs within the complete GenBank database, and have designed PCR amplification primers for over 13 million SSRs. The SSR Taxonomy Tree server provides web-based searching and browsing of species and taxa for the visualisation and download of these SSR amplification primers. These tools are available at http://bioinformatics.pbcbasc.latrobe.edu.au/ssrdiscovery.html.
Sea Anemones: Quiet Achievers in the Field of Peptide Toxins
Pavasovic, Ana
2018-01-01
Sea anemones have been understudied as a source of peptide and protein toxins, with relatively few examined as a source of new pharmacological tools or therapeutic leads. This is surprising given the success of some anemone peptides that have been tested, such as the potassium channel blocker from Stichodactyla helianthus known as ShK. An analogue of this peptide, ShK-186, which is now known as dalazatide, has successfully completed Phase 1 clinical trials and is about to enter Phase 2 trials for the treatment of autoimmune diseases. One of the impediments to the exploitation of sea anemone toxins in the pharmaceutical industry has been the difficulty associated with their high-throughput discovery and isolation. Recent developments in multiple ‘omic’ technologies, including genomics, transcriptomics and proteomics, coupled with advanced bioinformatics, have opened the way for large-scale discovery of novel sea anemone toxins from a range of species. Many of these toxins will be useful pharmacological tools and some will hopefully prove to be valuable therapeutic leads. PMID:29316700
Cheng, Liang; Hu, Yang; Sun, Jie; Zhou, Meng; Jiang, Qinghua
2018-06-01
DincRNA aims to provide a comprehensive web-based bioinformatics toolkit to elucidate the entangled relationships among diseases and non-coding RNAs (ncRNAs) from the perspective of disease similarity. The quantitative way to illustrate relationships of pair-wise diseases always depends on their molecular mechanisms, and structures of the directed acyclic graph of Disease Ontology (DO). Corresponding methods for calculating similarity of pair-wise diseases involve Resnik's, Lin's, Wang's, PSB and SemFunSim methods. Recently, disease similarity was validated suitable for calculating functional similarities of ncRNAs and prioritizing ncRNA-disease pairs, and it has been widely applied for predicting the ncRNA function due to the limited biological knowledge from wet lab experiments of these RNAs. For this purpose, a large number of algorithms and priori knowledge need to be integrated. e.g. 'pair-wise best, pairs-average' (PBPA) and 'pair-wise all, pairs-maximum' (PAPM) methods for calculating functional similarities of ncRNAs, and random walk with restart (RWR) method for prioritizing ncRNA-disease pairs. To facilitate the exploration of disease associations and ncRNA function, DincRNA implemented all of the above eight algorithms based on DO and disease-related genes. Currently, it provides the function to query disease similarity scores, miRNA and lncRNA functional similarity scores, and the prioritization scores of lncRNA-disease and miRNA-disease pairs. http://bio-annotation.cn:18080/DincRNAClient/. biofomeng@hotmail.com or qhjiang@hit.edu.cn. Supplementary data are available at Bioinformatics online.
NASA Technical Reports Server (NTRS)
Tilton, James C.; Cook, Diane J.
2008-01-01
Under a project recently selected for funding by NASA's Science Mission Directorate under the Applied Information Systems Research (AISR) program, Tilton and Cook will design and implement the integration of the Subdue graph based knowledge discovery system, developed at the University of Texas Arlington and Washington State University, with image segmentation hierarchies produced by the RHSEG software, developed at NASA GSFC, and perform pilot demonstration studies of data analysis, mining and knowledge discovery on NASA data. Subdue represents a method for discovering substructures in structural databases. Subdue is devised for general-purpose automated discovery, concept learning, and hierarchical clustering, with or without domain knowledge. Subdue was developed by Cook and her colleague, Lawrence B. Holder. For Subdue to be effective in finding patterns in imagery data, the data must be abstracted up from the pixel domain. An appropriate abstraction of imagery data is a segmentation hierarchy: a set of several segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. The RHSEG program, a recursive approximation to a Hierarchical Segmentation approach (HSEG), can produce segmentation hierarchies quickly and effectively for a wide variety of images. RHSEG and HSEG were developed at NASA GSFC by Tilton. In this presentation we provide background on the RHSEG and Subdue technologies and present a preliminary analysis on how RHSEG and Subdue may be combined to enhance image data analysis, mining and knowledge discovery.
Predicting future discoveries from current scientific literature.
Petrič, Ingrid; Cestnik, Bojan
2014-01-01
Knowledge discovery in biomedicine is a time-consuming process starting from the basic research, through preclinical testing, towards possible clinical applications. Crossing of conceptual boundaries is often needed for groundbreaking biomedical research that generates highly inventive discoveries. We demonstrate the ability of a creative literature mining method to advance valuable new discoveries based on rare ideas from existing literature. When emerging ideas from scientific literature are put together as fragments of knowledge in a systematic way, they may lead to original, sometimes surprising, research findings. If enough scientific evidence is already published for the association of such findings, they can be considered as scientific hypotheses. In this chapter, we describe a method for the computer-aided generation of such hypotheses based on the existing scientific literature. Our literature-based discovery of NF-kappaB with its possible connections to autism was recently approved by scientific community, which confirms the ability of our literature mining methodology to accelerate future discoveries based on rare ideas from existing literature.
Bioenergy Knowledge Discovery Framework Fact Sheet
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
The Bioenergy Knowledge Discovery Framework (KDF) supports the development of a sustainable bioenergy industry by providing access to a variety of data sets, publications, and collaboration and mapping tools that support bioenergy research, analysis, and decision making. In the KDF, users can search for information, contribute data, and use the tools and map interface to synthesize, analyze, and visualize information in a spatially integrated manner.
Teachers' Journal Club: Bridging between the Dynamics of Biological Discoveries and Biology Teachers
ERIC Educational Resources Information Center
Brill, Gilat; Falk, Hedda; Yarden, Anat
2003-01-01
Since biology is one of the most dynamic research fields within the natural sciences, the gap between the accumulated knowledge in biology and the knowledge that is taught in schools, increases rapidly with time. Our long-term objective is to develop means to bridge between the dynamics of biological discoveries and the biology teachers and…
Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.
Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.
2013-01-01
The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.
Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn
2018-06-01
Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
Ontology- and graph-based similarity assessment in biological networks.
Wang, Haiying; Zheng, Huiru; Azuaje, Francisco
2010-10-15
A standard systems-based approach to biomarker and drug target discovery consists of placing putative biomarkers in the context of a network of biological interactions, followed by different 'guilt-by-association' analyses. The latter is typically done based on network structural features. Here, an alternative analysis approach in which the networks are analyzed on a 'semantic similarity' space is reported. Such information is extracted from ontology-based functional annotations. We present SimTrek, a Cytoscape plugin for ontology-based similarity assessment in biological networks. http://rosalind.infj.ulst.ac.uk/SimTrek.html francisco.azuaje@crp-sante.lu Supplementary data are available at Bioinformatics online.
Karaboga, D; Aslan, S
2016-04-27
The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.
AMPA: an automated web server for prediction of protein antimicrobial regions.
Torrent, Marc; Di Tommaso, Paolo; Pulido, David; Nogués, M Victòria; Notredame, Cedric; Boix, Ester; Andreu, David
2012-01-01
AMPA is a web application for assessing the antimicrobial domains of proteins, with a focus on the design on new antimicrobial drugs. The application provides fast discovery of antimicrobial patterns in proteins that can be used to develop new peptide-based drugs against pathogens. Results are shown in a user-friendly graphical interface and can be downloaded as raw data for later examination. AMPA is freely available on the web at http://tcoffee.crg.cat/apps/ampa. The source code is also available in the web. marc.torrent@upf.edu; david.andreu@upf.edu Supplementary data are available at Bioinformatics online.
Endocrinology Meets Metabolomics: Achievements, Pitfalls, and Challenges.
Tokarz, Janina; Haid, Mark; Cecil, Alexander; Prehn, Cornelia; Artati, Anna; Möller, Gabriele; Adamski, Jerzy
2017-10-01
The metabolome, although very dynamic, is sufficiently stable to provide specific quantitative traits related to health and disease. Metabolomics requires balanced use of state-of-the-art study design, chemical analytics, biostatistics, and bioinformatics to deliver meaningful answers to contemporary questions in human disease research. The technology is now frequently employed for biomarker discovery and for elucidating the mechanisms underlying endocrine-related diseases. Metabolomics has also enriched genome-wide association studies (GWAS) in this area by providing functional data. The contributions of rare genetic variants to metabolome variance and to the human phenotype have been underestimated until now. Copyright © 2017 Elsevier Ltd. All rights reserved.
The digital language of amino acids.
Kurić, L
2007-11-01
The subject of this paper is a digital approach to the investigation of the biochemical basis of genetic processes. The digital mechanism of nucleic acid and protein bio-syntheses, the evolution of biomacromolecules and, especially, the biochemical evolution of genetic language have been analyzed by the application of cybernetic methods, information theory and system theory, respectively. This paper reports the discovery of new methods for developing the new technologies in genetics. It is about the most advanced digital technology which is based on program, cybernetics and informational systems and laws. The results in the practical application of the new technology could be useful in bioinformatics, genetics, biochemistry, medicine and other natural sciences.
ERIC Educational Resources Information Center
Gelbart, Hadas; Brill, Gilat; Yarden, Anat
2009-01-01
Providing learners with opportunities to engage in activities similar to those carried out by scientists was addressed in a web-based research simulation in genetics developed for high school biology students. The research simulation enables learners to apply their genetics knowledge while giving them an opportunity to participate in an authentic…
Genome re-annotation: a wiki solution?
Salzberg, Steven L
2007-01-01
The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution. PMID:17274839
Lan, D; Hu, Y D; Zhu, Q; Li, D Y; Liu, Y P
2015-07-28
The direction of production for indigenous chicken breeds is currently unknown and this knowledge, combined with the development of chicken genome-wide association studies, led us to investigate differences in specific loci between broiler and layer chicken using bioinformatic methods. In addition, we analyzed the distribution of these seven identified loci in four Chinese indigenous chicken breeds, Caoke chicken, Jiuyuan chicken, Sichuan mountain chicken, and Tibetan chicken, using DNA direct sequencing methods, and analyzed the data using bioinformatic methods. Based on the results, we suggest that Caoke chicken could be developed for meat production, while Jiuyuan chicken could be developed for egg production. As Sichuan mountain chicken and Tibetan chicken exhibited large polymorphisms, these breeds could be improved by changing their living environment.
Endodontic Microbiology and Pathobiology: Current State of Knowledge.
Fouad, Ashraf F
2017-01-01
Newer research tools and basic science knowledge base have allowed the exploration of endodontic diseases in the pulp and periapical tissues in novel ways. The use of next generation sequencing, bioinformatics analyses, genome-wide association studies, to name just a few of these innovations, has allowed the identification of hundreds of microorganisms and of host response factors. This review addresses recent advances in endodontic microbiology and the host response and discusses the potential for future innovations in this area. Copyright © 2016 Elsevier Inc. All rights reserved.
Knowledge Discovery from Posts in Online Health Communities Using Unified Medical Language System.
Chen, Donghua; Zhang, Runtong; Liu, Kecheng; Hou, Lei
2018-06-19
Patient-reported posts in Online Health Communities (OHCs) contain various valuable information that can help establish knowledge-based online support for online patients. However, utilizing these reports to improve online patient services in the absence of appropriate medical and healthcare expert knowledge is difficult. Thus, we propose a comprehensive knowledge discovery method that is based on the Unified Medical Language System for the analysis of narrative posts in OHCs. First, we propose a domain-knowledge support framework for OHCs to provide a basis for post analysis. Second, we develop a Knowledge-Involved Topic Modeling (KI-TM) method to extract and expand explicit knowledge within the text. We propose four metrics, namely, explicit knowledge rate, latent knowledge rate, knowledge correlation rate, and perplexity, for the evaluation of the KI-TM method. Our experimental results indicate that our proposed method outperforms existing methods in terms of providing knowledge support. Our method enhances knowledge support for online patients and can help develop intelligent OHCs in the future.
Boonen, Kurt; Landuyt, Bart; Baggerman, Geert; Husson, Steven J; Huybrechts, Jurgen; Schoofs, Liliane
2008-02-01
MS is currently one of the most important analytical techniques in biological and medical research. ESI and MALDI launched the field of MS into biology. The performance of mass spectrometers increased tremendously over the past decades. Other technological advances increased the analytical power of biological MS even more. First, the advent of the genome projects allowed an automated analysis of mass spectrometric data. Second, improved separation techniques, like nanoscale HPLC, are essential for MS analysis of biomolecules. The recent progress in bioinformatics is the third factor that accelerated the biochemical analysis of macromolecules. The first part of this review will introduce the basics of these techniques. The field that integrates all these techniques to identify endogenous peptides is called peptidomics and will be discussed in the last section. This integrated approach aims at identifying all the present peptides in a cell, organ or organism (the peptidome). Today, peptidomics is used by several fields of research. Special emphasis will be given to the identification of neuropeptides, a class of short proteins that fulfil several important intercellular signalling functions in every animal. MS imaging techniques and biomarker discovery will also be discussed briefly.
Systemic bioinformatics analysis of skeletal muscle gene expression profiles of sepsis
Yang, Fang; Wang, Yumei
2018-01-01
Sepsis is a type of systemic inflammatory response syndrome with high morbidity and mortality. Skeletal muscle dysfunction is one of the major complications of sepsis that may also influence the outcome of sepsis. The aim of the present study was to explore and identify potential mechanisms and therapeutic targets of sepsis. Systemic bioinformatics analysis of skeletal muscle gene expression profiles from the Gene Expression Omnibus was performed. Differentially expressed genes (DEGs) in samples from patients with sepsis and control samples were screened out using the limma package. Differential co-expression and coregulation (DCE and DCR, respectively) analysis was performed based on the Differential Co-expression Analysis package to identify differences in gene co-expression and coregulation patterns between the control and sepsis groups. Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways of DEGs were identified using the Database for Annotation, Visualization and Integrated Discovery, and inflammatory, cancer and skeletal muscle development-associated biological processes and pathways were identified. DCE and DCR analysis revealed several potential therapeutic targets for sepsis, including genes and transcription factors. The results of the present study may provide a basis for the development of novel therapeutic targets and treatment methods for sepsis. PMID:29805480
Microarray gene expression profiling analysis combined with bioinformatics in multiple sclerosis.
Liu, Mingyuan; Hou, Xiaojun; Zhang, Ping; Hao, Yong; Yang, Yiting; Wu, Xiongfeng; Zhu, Desheng; Guan, Yangtai
2013-05-01
Multiple sclerosis (MS) is the most prevalent demyelinating disease and the principal cause of neurological disability in young adults. Recent microarray gene expression profiling studies have identified several genetic variants contributing to the complex pathogenesis of MS, however, expressional and functional studies are still required to further understand its molecular mechanism. The present study aimed to analyze the molecular mechanism of MS using microarray analysis combined with bioinformatics techniques. We downloaded the gene expression profile of MS from Gene Expression Omnibus (GEO) and analysed the microarray data using the differentially coexpressed genes (DCGs) and links package in R and Database for Annotation, Visualization and Integrated Discovery. The regulatory impact factor (RIF) algorithm was used to measure the impact factor of transcription factor. A total of 1,297 DCGs between MS patients and healthy controls were identified. Functional annotation indicated that these DCGs were associated with immune and neurological functions. Furthermore, the RIF result suggested that IKZF1, BACH1, CEBPB, EGR1, FOS may play central regulatory roles in controlling gene expression in the pathogenesis of MS. Our findings confirm the presence of multiple molecular alterations in MS and indicate the possibility for identifying prognostic factors associated with MS pathogenesis.
Learning in the context of distribution drift
2017-05-09
published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD
Xiang, Yang; Lu, Kewei; James, Stephen L.; Borlawsky, Tara B.; Huang, Kun; Payne, Philip R.O.
2011-01-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. PMID:22154838
Xiang, Yang; Lu, Kewei; James, Stephen L; Borlawsky, Tara B; Huang, Kun; Payne, Philip R O
2012-04-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. Copyright © 2011 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Olama, Mohammed M; Nutaro, James J; Sukumar, Sreenivas R
2013-01-01
The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Optionsmore » that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.« less
Knowledge Discovery and Data Mining in Iran's Climatic Researches
NASA Astrophysics Data System (ADS)
Karimi, Mostafa
2013-04-01
Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.
Knowledge Retrieval Solutions.
ERIC Educational Resources Information Center
Khan, Kamran
1998-01-01
Excalibur RetrievalWare offers true knowledge retrieval solutions. Its fundamental technologies, Adaptive Pattern Recognition Processing and Semantic Networks, have capabilities for knowledge discovery and knowledge management of full-text, structured and visual information. The software delivers a combination of accuracy, extensibility,…
Knowledge extraction from evolving spiking neural networks with rank order population coding.
Soltic, Snjezana; Kasabov, Nikola
2010-12-01
This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.
Flood AI: An Intelligent Systems for Discovery and Communication of Disaster Knowledge
NASA Astrophysics Data System (ADS)
Demir, I.; Sermet, M. Y.
2017-12-01
Communities are not immune from extreme events or natural disasters that can lead to large-scale consequences for the nation and public. Improving resilience to better prepare, plan, recover, and adapt to disasters is critical to reduce the impacts of extreme events. The National Research Council (NRC) report discusses the topic of how to increase resilience to extreme events through a vision of resilient nation in the year 2030. The report highlights the importance of data, information, gaps and knowledge challenges that needs to be addressed, and suggests every individual to access the risk and vulnerability information to make their communities more resilient. This project presents an intelligent system, Flood AI, for flooding to improve societal preparedness by providing a knowledge engine using voice recognition, artificial intelligence, and natural language processing based on a generalized ontology for disasters with a primary focus on flooding. The knowledge engine utilizes the flood ontology and concepts to connect user input to relevant knowledge discovery channels on flooding by developing a data acquisition and processing framework utilizing environmental observations, forecast models, and knowledge bases. Communication channels of the framework includes web-based systems, agent-based chat bots, smartphone applications, automated web workflows, and smart home devices, opening the knowledge discovery for flooding to many unique use cases.
Jiang, Wei; Yu, Weichuan
2017-01-01
In genome-wide association studies, we normally discover associations between genetic variants and diseases/traits in primary studies, and validate the findings in replication studies. We consider the associations identified in both primary and replication studies as true findings. An important question under this two-stage setting is how to determine significance levels in both studies. In traditional methods, significance levels of the primary and replication studies are determined separately. We argue that the separate determination strategy reduces the power in the overall two-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate in the two-stage study. To enjoy the power improvement from the joint determination method, we need to select single nucleotide polymorphisms for replication at a less stringent significance level. This is a common practice in studies designed for discovery purpose. We suggest this practice is also suitable in studies with validation purpose in order to identify more true findings. Simulation experiments show that our method can provide more power than traditional methods and that the false discovery rate is well-controlled. Empirical experiments on datasets of five diseases/traits demonstrate that our method can help identify more associations. The R-package is available at: http://bioinformatics.ust.hk/RFdr.html .
IndeCut evaluates performance of network motif discovery algorithms.
Ansariola, Mitra; Megraw, Molly; Koslicki, David
2018-05-01
Genomic networks represent a complex map of molecular interactions which are descriptive of the biological processes occurring in living cells. Identifying the small over-represented circuitry patterns in these networks helps generate hypotheses about the functional basis of such complex processes. Network motif discovery is a systematic way of achieving this goal. However, a reliable network motif discovery outcome requires generating random background networks which are the result of a uniform and independent graph sampling method. To date, there has been no method to numerically evaluate whether any network motif discovery algorithm performs as intended on realistically sized datasets-thus it was not possible to assess the validity of resulting network motifs. In this work, we present IndeCut, the first method to date that characterizes network motif finding algorithm performance in terms of uniform sampling on realistically sized networks. We demonstrate that it is critical to use IndeCut prior to running any network motif finder for two reasons. First, IndeCut indicates the number of samples needed for a tool to produce an outcome that is both reproducible and accurate. Second, IndeCut allows users to choose the tool that generates samples in the most independent fashion for their network of interest among many available options. The open source software package is available at https://github.com/megrawlab/IndeCut. megrawm@science.oregonstate.edu or david.koslicki@math.oregonstate.edu. Supplementary data are available at Bioinformatics online.
Duncan, Dean F; Kum, Hye-Chung; Weigensberg, Elizabeth Caplick; Flair, Kimberly A; Stewart, C Joy
2008-11-01
Proper management and implementation of an effective child welfare agency requires the constant use of information about the experiences and outcomes of children involved in the system, emphasizing the need for comprehensive, timely, and accurate data. In the past 20 years, there have been many advances in technology that can maximize the potential of administrative data to promote better evaluation and management in the field of child welfare. Specifically, this article discusses the use of knowledge discovery and data mining (KDD), which makes it possible to create longitudinal data files from administrative data sources, extract valuable knowledge, and make the information available via a user-friendly public Web site. This article demonstrates a successful project in North Carolina where knowledge discovery and data mining technology was used to develop a comprehensive set of child welfare outcomes available through a public Web site to facilitate information sharing of child welfare data to improve policy and practice.
Architectural Organization of the Metabolic Regulatory Enzyme Ghrelin O-Acyltransferase*
Taylor, Martin S.; Ruch, Travis R.; Hsiao, Po-Yuan; Hwang, Yousang; Zhang, Pingfeng; Dai, Lixin; Huang, Cheng Ran Lisa; Berndsen, Christopher E.; Kim, Min-Sik; Pandey, Akhilesh; Wolberger, Cynthia; Marmorstein, Ronen; Machamer, Carolyn; Boeke, Jef D.; Cole, Philip A.
2013-01-01
Ghrelin O-acyltransferase (GOAT) is a polytopic integral membrane protein required for activation of ghrelin, a secreted metabolism-regulating peptide hormone. Although GOAT is a potential therapeutic target for the treatment of obesity and diabetes and plays a key role in other physiologic processes, little is known about its structure or mechanism. GOAT is a member of the membrane-bound O-acyltransferase (MBOAT) family, a group of polytopic integral membrane proteins involved in lipid-biosynthetic and lipid-signaling reactions from prokaryotes to humans. Here we use phylogeny and a variety of bioinformatic tools to predict the topology of GOAT. Using selective permeabilization indirect immunofluorescence microscopy in combination with glycosylation shift immunoblotting, we demonstrate that GOAT contains 11 transmembrane helices and one reentrant loop. Development of the V5Glyc tag, a novel, small, and sensitive dual topology reporter, facilitated these experiments. The MBOAT family invariant residue His-338 is in the ER lumen, consistent with other family members, but conserved Asn-307 is cytosolic, making it unlikely that both are involved in catalysis. Photocross-linking of synthetic ghrelin analogs and inhibitors demonstrates binding to the C-terminal region of GOAT, consistent with a role of His-338 in the active site. This knowledge of GOAT architecture is important for a deeper understanding of the mechanism of GOAT and other MBOATs and could ultimately advance the discovery of selective inhibitors for these enzymes. PMID:24045953
Architectural organization of the metabolic regulatory enzyme ghrelin O-acyltransferase.
Taylor, Martin S; Ruch, Travis R; Hsiao, Po-Yuan; Hwang, Yousang; Zhang, Pingfeng; Dai, Lixin; Huang, Cheng Ran Lisa; Berndsen, Christopher E; Kim, Min-Sik; Pandey, Akhilesh; Wolberger, Cynthia; Marmorstein, Ronen; Machamer, Carolyn; Boeke, Jef D; Cole, Philip A
2013-11-08
Ghrelin O-acyltransferase (GOAT) is a polytopic integral membrane protein required for activation of ghrelin, a secreted metabolism-regulating peptide hormone. Although GOAT is a potential therapeutic target for the treatment of obesity and diabetes and plays a key role in other physiologic processes, little is known about its structure or mechanism. GOAT is a member of the membrane-bound O-acyltransferase (MBOAT) family, a group of polytopic integral membrane proteins involved in lipid-biosynthetic and lipid-signaling reactions from prokaryotes to humans. Here we use phylogeny and a variety of bioinformatic tools to predict the topology of GOAT. Using selective permeabilization indirect immunofluorescence microscopy in combination with glycosylation shift immunoblotting, we demonstrate that GOAT contains 11 transmembrane helices and one reentrant loop. Development of the V5Glyc tag, a novel, small, and sensitive dual topology reporter, facilitated these experiments. The MBOAT family invariant residue His-338 is in the ER lumen, consistent with other family members, but conserved Asn-307 is cytosolic, making it unlikely that both are involved in catalysis. Photocross-linking of synthetic ghrelin analogs and inhibitors demonstrates binding to the C-terminal region of GOAT, consistent with a role of His-338 in the active site. This knowledge of GOAT architecture is important for a deeper understanding of the mechanism of GOAT and other MBOATs and could ultimately advance the discovery of selective inhibitors for these enzymes.
A benchmark study of scoring methods for non-coding mutations.
Drubay, Damien; Gautheret, Daniel; Michiels, Stefan
2018-05-15
Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
Liu, Jun-Jun; Shamoun, Simon Francis; Leal, Isabel; Kowbel, Robert; Sumampong, Grace; Zamany, Arezoo
2018-05-01
Characterization of genes involved in differentiation of pathogen species and isolates with variations of virulence traits provides valuable information to control tree diseases for meeting the challenges of sustainable forest health and phytosanitary trade issues. Lack of genetic knowledge and genomic resources hinders novel gene discovery, molecular mechanism studies and development of diagnostic tools in the management of forest pathogens. Here, we report on transcriptome profiling of Heterobasidion occidentale isolates with contrasting virulence levels. Comparative transcriptomic analysis identified orthologous groups exclusive to H. occidentale and its isolates, revealing biological processes involved in the differentiation of isolates. Further bioinformatics analyses identified an H. occidentale secretome, CYPome and other candidate effectors, from which genes with species- and isolate-specific expression were characterized. A large proportion of differentially expressed genes were revealed to have putative activities as cell wall modification enzymes and transcription factors, suggesting their potential roles in virulence and fungal pathogenesis. Next, large numbers of simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were detected, including more than 14 000 interisolate non-synonymous SNPs. These polymorphic loci and species/isolate-specific genes may contribute to virulence variations and provide ideal DNA markers for development of diagnostic tools and investigation of genetic diversity. © 2018 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
A Knowledge Discovery from POS Data using State Space Models
NASA Astrophysics Data System (ADS)
Sato, Tadahiko; Higuchi, Tomoyuki
The number of competing-brands changes by new product's entry. The new product introduction is endemic among consumer packaged goods firm and is an integral component of their marketing strategy. As a new product's entry affects markets, there is a pressing need to develop market response model that can adapt to such changes. In this paper, we develop a dynamic model that capture the underlying evolution of the buying behavior associated with the new product. This extends an application of a dynamic linear model, which is used by a number of time series analyses, by allowing the observed dimension to change at some point in time. Our model copes with a problem that dynamic environments entail: changes in parameter over time and changes in the observed dimension. We formulate the model with framework of a state space model. We realize an estimation of the model using modified Kalman filter/fixed interval smoother. We find that new product's entry (1) decreases brand differentiation for existing brands, as indicated by decreasing difference between cross-price elasticities; (2) decreases commodity power for existing brands, as indicated by decreasing trend; and (3) decreases the effect of discount for existing brands, as indicated by a decrease in the magnitude of own-brand price elasticities. The proposed framework is directly applicable to other fields in which the observed dimension might be change, such as economic, bioinformatics, and so forth.
Developing integrated crop knowledge networks to advance candidate gene discovery.
Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher
2016-12-01
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
2014-01-01
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed. PMID:24564249
Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa
2006-01-01
Background Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. Results The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. Conclusion This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants. PMID:16836766
On the Growth of Scientific Knowledge: Yeast Biology as a Case Study
He, Xionglei; Zhang, Jianzhi
2009-01-01
The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration. PMID:19300476
On the growth of scientific knowledge: yeast biology as a case study.
He, Xionglei; Zhang, Jianzhi
2009-03-01
The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration.
Long Non-coding RNAs and Their Biological Roles in Plants
Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian
2015-01-01
With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. PMID:25936895
Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred
2017-01-01
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
Applying Knowledge Discovery in Databases in Public Health Data Set: Challenges and Concerns
Volrathongchia, Kanittha
2003-01-01
In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545
Big, Deep, and Smart Data in Scanning Probe Microscopy
Kalinin, Sergei V.; Strelcov, Evgheni; Belianinov, Alex; ...
2016-09-27
Scanning probe microscopy techniques open the door to nanoscience and nanotechnology by enabling imaging and manipulation of structure and functionality of matter on nanometer and atomic scales. We analyze the discovery process by SPM in terms of information flow from tip-surface junction to the knowledge adoption by scientific community. Furthermore, we discuss the challenges and opportunities offered by merging of SPM and advanced data mining, visual analytics, and knowledge discovery technologies.
Exploiting Early Intent Recognition for Competitive Advantage
2009-01-01
basketball [Bhan- dari et al., 1997; Jug et al., 2003], and Robocup soccer sim- ulations [Riley and Veloso, 2000; 2002; Kuhlmann et al., 2006] and non...actions (e.g. before, after, around). Jug et al. [2003] used a similar framework for offline basketball game analysis. More recently, Hess et al...and K. Ramanujam. Advanced Scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery, 1(1):121–125, 1997. [Chang
ERIC Educational Resources Information Center
Fyfe, Emily R.; DeCaro, Marci S.; Rittle-Johnson, Bethany
2013-01-01
An emerging consensus suggests that guided discovery, which combines discovery and instruction, is a more effective educational approach than either one in isolation. The goal of this study was to examine two specific forms of guided discovery, testing whether conceptual instruction should precede or follow exploratory problem solving. In both…
ERIC Educational Resources Information Center
Liu, Chen-Chung; Don, Ping-Hsing; Chung, Chen-Wei; Lin, Shao-Jun; Chen, Gwo-Dong; Liu, Baw-Jhiune
2010-01-01
While Web discovery is usually undertaken as a solitary activity, Web co-discovery may transform Web learning activities from the isolated individual search process into interactive and collaborative knowledge exploration. Recent studies have proposed Web co-search environments on a single computer, supported by multiple one-to-one technologies.…
Knowledge Management in Higher Education: A Knowledge Repository Approach
ERIC Educational Resources Information Center
Wedman, John; Wang, Feng-Kwei
2005-01-01
One might expect higher education, where the discovery and dissemination of new and useful knowledge is vital, to be among the first to implement knowledge management practices. Surprisingly, higher education has been slow to implement knowledge management practices (Townley, 2003). This article describes an ongoing research and development effort…
Crowdsourcing Knowledge Discovery and Innovations in Medicine
2014-01-01
Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. “Health hackathons” and “data marathons”, in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled. PMID:25239002
Crowdsourcing knowledge discovery and innovations in medicine.
Celi, Leo Anthony; Ippolito, Andrea; Montgomery, Robert A; Moses, Christopher; Stone, David J
2014-09-19
Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. "Health hackathons" and "data marathons", in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled.
Empirical study using network of semantically related associations in bridging the knowledge gap.
Abedi, Vida; Yeasin, Mohammed; Zand, Ramin
2014-11-27
The data overload has created a new set of challenges in finding meaningful and relevant information with minimal cognitive effort. However designing robust and scalable knowledge discovery systems remains a challenge. Recent innovations in the (biological) literature mining tools have opened new avenues to understand the confluence of various diseases, genes, risk factors as well as biological processes in bridging the gaps between the massive amounts of scientific data and harvesting useful knowledge. In this paper, we highlight some of the findings using a text analytics tool, called ARIANA--Adaptive Robust and Integrative Analysis for finding Novel Associations. Empirical study using ARIANA reveals knowledge discovery instances that illustrate the efficacy of such tool. For example, ARIANA can capture the connection between the drug hexamethonium and pulmonary inflammation and fibrosis that caused the tragic death of a healthy volunteer in a 2001 John Hopkins asthma study, even though the abstract of the study was not part of the semantic model. An integrated system, such as ARIANA, could assist the human expert in exploratory literature search by bringing forward hidden associations, promoting data reuse and knowledge discovery as well as stimulating interdisciplinary projects by connecting information across the disciplines.
The Importance of Biological Databases in Biological Discovery.
Baxevanis, Andreas D; Bateman, Alex
2015-06-19
Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.
von Heijne, Gunnar
2018-01-01
My scientific career has taken me from chemistry, via theoretical physics and bioinformatics, to molecular biology and even structural biology. Along the way, serendipity led me to work on problems such as the identification of signal peptides that direct protein trafficking, membrane protein biogenesis, and cotranslational protein folding. I've had some great collaborations that came about because of a stray conversation or from following up on an interesting paper. And I've had the good fortune to be asked to sit on the Nobel Committee for Chemistry, where I am constantly reminded of the amazing pace and often intricate history of scientific discovery. Could I have planned this? No way! I just went with the flow … PMID:29523692
Halophiles and their enzymes: negativity put to good use.
DasSarma, Shiladitya; DasSarma, Priya
2015-06-01
Halophilic microorganisms possess stable enzymes that function in very high salinity, an extreme condition that leads to denaturation, aggregation, and precipitation of most other proteins. Genomic and structural analyses have established that the enzymes of halophilic Archaea and many halophilic Bacteria are negatively charged due to an excess of acidic over basic residues, and altered hydrophobicity, which enhance solubility and promote function in low water activity conditions. Here, we provide an update on recent bioinformatic analysis of predicted halophilic proteomes as well as experimental molecular studies on individual halophilic enzymes. Recent efforts on discovery and utilization of halophiles and their enzymes for biotechnology, including biofuel applications are also considered. Copyright © 2015 Elsevier Ltd. All rights reserved.
Biopharma business models in Canada.
March-Chordà, I; Yagüe-Perales, R M
2011-08-01
This article provides new insights into the different strategy paths or business models currently being implemented by Canadian biopharma companies. Through a case-study methodology, seven biopharma companies pertaining to three business models were analyzed, leading to a broad set of results emerging from the following areas: activity, business model and strategy; management and human resources; and R&D, technology and innovation strategy. The three business models represented were: model 1 (conventional biotech oriented to new drug development, radical innovation and search for discoveries); model 2 (development of a technology platform, usually in proteomics and bioinformatics); and model 3 (incremental innovation, with shorter and less risky development timelines). Copyright © 2011 Elsevier Ltd. All rights reserved.