Sample records for knowledge discovery extracting

  1. Knowledge Discovery and Data Mining: An Overview

    NASA Technical Reports Server (NTRS)

    Fayyad, U.

    1995-01-01

    The process of knowledge discovery and data mining is the process of information extraction from very large databases. Its importance is described along with several techniques and considerations for selecting the most appropriate technique for extracting information from a particular data set.

  2. Knowledge Discovery from Databases: An Introductory Review.

    ERIC Educational Resources Information Center

    Vickery, Brian

    1997-01-01

    Introduces new procedures being used to extract knowledge from databases and discusses rationales for developing knowledge discovery methods. Methods are described for such techniques as classification, clustering, and the detection of deviations from pre-established norms. Examines potential uses of knowledge discovery in the information field.…

  3. Knowledge extraction from evolving spiking neural networks with rank order population coding.

    PubMed

    Soltic, Snjezana; Kasabov, Nikola

    2010-12-01

    This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.

  4. Knowledge Discovery in Textual Documentation: Qualitative and Quantitative Analyses.

    ERIC Educational Resources Information Center

    Loh, Stanley; De Oliveira, Jose Palazzo M.; Gastal, Fabio Leite

    2001-01-01

    Presents an application of knowledge discovery in texts (KDT) concerning medical records of a psychiatric hospital. The approach helps physicians to extract knowledge about patients and diseases that may be used for epidemiological studies, for training professionals, and to support physicians to diagnose and evaluate diseases. (Author/AEF)

  5. Data Mining.

    ERIC Educational Resources Information Center

    Benoit, Gerald

    2002-01-01

    Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…

  6. Medical knowledge discovery and management.

    PubMed

    Prior, Fred

    2009-05-01

    Although the volume of medical information is growing rapidly, the ability to rapidly convert this data into "actionable insights" and new medical knowledge is lagging far behind. The first step in the knowledge discovery process is data management and integration, which logically can be accomplished through the application of data warehouse technologies. A key insight that arises from efforts in biosurveillance and the global scope of military medicine is that information must be integrated over both time (longitudinal health records) and space (spatial localization of health-related events). Once data are compiled and integrated it is essential to encode the semantics and relationships among data elements through the use of ontologies and semantic web technologies to convert data into knowledge. Medical images form a special class of health-related information. Traditionally knowledge has been extracted from images by human observation and encoded via controlled terminologies. This approach is rapidly being replaced by quantitative analyses that more reliably support knowledge extraction. The goals of knowledge discovery are the improvement of both the timeliness and accuracy of medical decision making and the identification of new procedures and therapies.

  7. Concept of operations for knowledge discovery from Big Data across enterprise data warehouses

    NASA Astrophysics Data System (ADS)

    Sukumar, Sreenivas R.; Olama, Mohammed M.; McNair, Allen W.; Nutaro, James J.

    2013-05-01

    The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Options that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.

  8. Concept of Operations for Collaboration and Discovery from Big Data Across Enterprise Data Warehouses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olama, Mohammed M; Nutaro, James J; Sukumar, Sreenivas R

    2013-01-01

    The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Optionsmore » that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.« less

  9. SemaTyP: a knowledge graph based literature mining method for drug discovery.

    PubMed

    Sang, Shengtian; Yang, Zhihao; Wang, Lei; Liu, Xiaoxia; Lin, Hongfei; Wang, Jian

    2018-05-30

    Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.

  10. PKDE4J: Entity and relation extraction for public knowledge discovery.

    PubMed

    Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young

    2015-10-01

    Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Knowledge discovery about quality of life changes of spinal cord injury patients: clustering based on rules by states.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María

    2009-01-01

    In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.

  12. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature

    PubMed Central

    Xu, Rong; Li, Li; Wang, QuanQiu

    2013-01-01

    Motivation: Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease–phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease–manifestation (D-M) pairs (one specific type of disease–phenotype relationship) from the wide body of published biomedical literature. Data and Methods: Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M–specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. Results: In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. Conclusions: The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. Availability: http://nlp.case.edu/public/data/DMPatternUMLS/ Contact: rxx@case.edu PMID:23828786

  13. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. A bioinformatics knowledge discovery in text application for grid computing

    PubMed Central

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-01-01

    Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749

  15. A bioinformatics knowledge discovery in text application for grid computing.

    PubMed

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-06-16

    A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

  16. Knowledge Discovery in Variant Databases Using Inductive Logic Programming

    PubMed Central

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D.

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/. PMID:23589683

  17. Knowledge discovery in variant databases using inductive logic programming.

    PubMed

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.

  18. Knowledge Discovery and Data Mining in Iran's Climatic Researches

    NASA Astrophysics Data System (ADS)

    Karimi, Mostafa

    2013-04-01

    Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.

  19. A collaborative filtering-based approach to biomedical knowledge discovery.

    PubMed

    Lever, Jake; Gakkhar, Sitanshu; Gottlieb, Michael; Rashnavadi, Tahereh; Lin, Santina; Siu, Celia; Smith, Maia; Jones, Martin R; Krzywinski, Martin; Jones, Steven J M; Wren, Jonathan

    2018-02-15

    The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers. We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations. All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery. sjones@bcgsc.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Ontology-guided data preparation for discovering genotype-phenotype relationships.

    PubMed

    Coulet, Adrien; Smaïl-Tabbone, Malika; Benlian, Pascale; Napoli, Amedeo; Devignes, Marie-Dominique

    2008-04-25

    Complexity and amount of post-genomic data constitute two major factors limiting the application of Knowledge Discovery in Databases (KDD) methods in life sciences. Bio-ontologies may nowadays play key roles in knowledge discovery in life science providing semantics to data and to extracted units, by taking advantage of the progress of Semantic Web technologies concerning the understanding and availability of tools for knowledge representation, extraction, and reasoning. This paper presents a method that exploits bio-ontologies for guiding data selection within the preparation step of the KDD process. We propose three scenarios in which domain knowledge and ontology elements such as subsumption, properties, class descriptions, are taken into account for data selection, before the data mining step. Each of these scenarios is illustrated within a case-study relative to the search of genotype-phenotype relationships in a familial hypercholesterolemia dataset. The guiding of data selection based on domain knowledge is analysed and shows a direct influence on the volume and significance of the data mining results. The method proposed in this paper is an efficient alternative to numerical methods for data selection based on domain knowledge. In turn, the results of this study may be reused in ontology modelling and data integration.

  1. Ontology-Based Search of Genomic Metadata.

    PubMed

    Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano

    2016-01-01

    The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

  2. Key Relation Extraction from Biomedical Publications.

    PubMed

    Huang, Lan; Wang, Ye; Gong, Leiguang; Kulikowski, Casimir; Bai, Tian

    2017-01-01

    Within the large body of biomedical knowledge, recent findings and discoveries are most often presented as research articles. Their number has been increasing sharply since the turn of the century, presenting ever-growing challenges for search and discovery of knowledge and information related to specific topics of interest, even with the help of advanced online search tools. This is especially true when the goal of a search is to find or discover key relations between important concepts or topic words. We have developed an innovative method for extracting key relations between concepts from abstracts of articles. The method focuses on relations between keywords or topic words in the articles. Early experiments with the method on PubMed publications have shown promising results in searching and discovering keywords and their relationships that are strongly related to the main topic of an article.

  3. Knowledge Discovery from Posts in Online Health Communities Using Unified Medical Language System.

    PubMed

    Chen, Donghua; Zhang, Runtong; Liu, Kecheng; Hou, Lei

    2018-06-19

    Patient-reported posts in Online Health Communities (OHCs) contain various valuable information that can help establish knowledge-based online support for online patients. However, utilizing these reports to improve online patient services in the absence of appropriate medical and healthcare expert knowledge is difficult. Thus, we propose a comprehensive knowledge discovery method that is based on the Unified Medical Language System for the analysis of narrative posts in OHCs. First, we propose a domain-knowledge support framework for OHCs to provide a basis for post analysis. Second, we develop a Knowledge-Involved Topic Modeling (KI-TM) method to extract and expand explicit knowledge within the text. We propose four metrics, namely, explicit knowledge rate, latent knowledge rate, knowledge correlation rate, and perplexity, for the evaluation of the KI-TM method. Our experimental results indicate that our proposed method outperforms existing methods in terms of providing knowledge support. Our method enhances knowledge support for online patients and can help develop intelligent OHCs in the future.

  4. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  5. KNODWAT: A scientific framework application for testing knowledge discovery methods for the biomedical domain

    PubMed Central

    2013-01-01

    Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework. PMID:23763826

  6. KNODWAT: a scientific framework application for testing knowledge discovery methods for the biomedical domain.

    PubMed

    Holzinger, Andreas; Zupan, Mario

    2013-06-13

    Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.

  7. Informing child welfare policy and practice: using knowledge discovery and data mining technology via a dynamic Web site.

    PubMed

    Duncan, Dean F; Kum, Hye-Chung; Weigensberg, Elizabeth Caplick; Flair, Kimberly A; Stewart, C Joy

    2008-11-01

    Proper management and implementation of an effective child welfare agency requires the constant use of information about the experiences and outcomes of children involved in the system, emphasizing the need for comprehensive, timely, and accurate data. In the past 20 years, there have been many advances in technology that can maximize the potential of administrative data to promote better evaluation and management in the field of child welfare. Specifically, this article discusses the use of knowledge discovery and data mining (KDD), which makes it possible to create longitudinal data files from administrative data sources, extract valuable knowledge, and make the information available via a user-friendly public Web site. This article demonstrates a successful project in North Carolina where knowledge discovery and data mining technology was used to develop a comprehensive set of child welfare outcomes available through a public Web site to facilitate information sharing of child welfare data to improve policy and practice.

  8. Knowledge discovery by accuracy maximization

    PubMed Central

    Cacciatore, Stefano; Luchinat, Claudio; Tenori, Leonardo

    2014-01-01

    Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold’s topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan’s presidency and not from its beginning. PMID:24706821

  9. 'Big Data' Collaboration: Exploring, Recording and Sharing Enterprise Knowledge

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas R; Ferrell, Regina Kay

    2013-01-01

    As data sources and data size proliferate, knowledge discovery from "Big Data" is starting to pose several challenges. In this paper, we address a specific challenge in the practice of enterprise knowledge management while extracting actionable nuggets from diverse data sources of seemingly-related information. In particular, we address the challenge of archiving knowledge gained through collaboration, dissemination and visualization as part of the data analysis, inference and decision-making lifecycle. We motivate the implementation of an enterprise data-discovery and knowledge recorder tool, called SEEKER based on real world case-study. We demonstrate SEEKER capturing schema and data-element relationships, tracking the data elementsmore » of value based on the queries and the analytical artifacts that are being created by analysts as they use the data. We show how the tool serves as digital record of institutional domain knowledge and a documentation for the evolution of data elements, queries and schemas over time. As a knowledge management service, a tool like SEEKER saves enterprise resources and time by avoiding analytic silos, expediting the process of multi-source data integration and intelligently documenting discoveries from fellow analysts.« less

  10. Eliciting and Representing High-Level Knowledge Requirements to Discover Ecological Knowledge in Flower-Visiting Data

    PubMed Central

    2016-01-01

    Observations of individual organisms (data) can be combined with expert ecological knowledge of species, especially causal knowledge, to model and extract from flower–visiting data useful information about behavioral interactions between insect and plant organisms, such as nectar foraging and pollen transfer. We describe and evaluate a method to elicit and represent such expert causal knowledge of behavioral ecology, and discuss the potential for wider application of this method to the design of knowledge-based systems for knowledge discovery in biodiversity and ecosystem informatics. PMID:27851814

  11. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.

    2013-01-01

    The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less

  12. Automated Knowledge Discovery from Simulators

    NASA Technical Reports Server (NTRS)

    Burl, Michael C.; DeCoste, D.; Enke, B. L.; Mazzoni, D.; Merline, W. J.; Scharenbroich, L.

    2006-01-01

    In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/ parameter/model space that lead to a particular output behavior. Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study otherwise. However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more efficient knowledge extraction.

  13. Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

    PubMed

    Ng; Wong

    1999-01-01

    We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.

  14. Using Best Practices to Extract, Organize, and Reuse Embedded Decision Support Content Knowledge Rules from Mature Clinical Systems.

    PubMed

    DesAutels, Spencer J; Fox, Zachary E; Giuse, Dario A; Williams, Annette M; Kou, Qing-Hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia

    2016-01-01

    Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems.

  15. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    PubMed Central

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.

    2012-01-01

    Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946

  16. Using Best Practices to Extract, Organize, and Reuse Embedded Decision Support Content Knowledge Rules from Mature Clinical Systems

    PubMed Central

    DesAutels, Spencer J.; Fox, Zachary E.; Giuse, Dario A.; Williams, Annette M.; Kou, Qing-hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia

    2016-01-01

    Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems. PMID:28269846

  17. A 100-year review: Carbohydrates - characterization, digestion, and utilization

    USDA-ARS?s Scientific Manuscript database

    Our knowledge of the role of carbohydrates in dairy cattle nutrition has advanced substantially during the 100 years in which the Journal of Dairy Science has been published. In this review, we traced the history of scientific investigation and discovery from crude fiber, nitrogen-free extract, and ...

  18. Data Mining and Knowledge Discovery tools for exploiting big Earth-Observation data

    NASA Astrophysics Data System (ADS)

    Espinoza Molina, D.; Datcu, M.

    2015-04-01

    The continuous increase in the size of the archives and in the variety and complexity of Earth-Observation (EO) sensors require new methodologies and tools that allow the end-user to access a large image repository, to extract and to infer knowledge about the patterns hidden in the images, to retrieve dynamically a collection of relevant images, and to support the creation of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In this context, we are concerned with providing a platform for data mining and knowledge discovery content from EO archives. The platform's goal is to implement a communication channel between Payload Ground Segments and the end-user who receives the content of the data coded in an understandable format associated with semantics that is ready for immediate exploitation. It will provide the user with automated tools to explore and understand the content of highly complex images archives. The challenge lies in the extraction of meaningful information and understanding observations of large extended areas, over long periods of time, with a broad variety of EO imaging sensors in synergy with other related measurements and data. The platform is composed of several components such as 1.) ingestion of EO images and related data providing basic features for image analysis, 2.) query engine based on metadata, semantics and image content, 3.) data mining and knowledge discovery tools for supporting the interpretation and understanding of image content, 4.) semantic definition of the image content via machine learning methods. All these components are integrated and supported by a relational database management system, ensuring the integrity and consistency of Terabytes of Earth Observation data.

  19. XML-based data model and architecture for a knowledge-based grid-enabled problem-solving environment for high-throughput biological imaging.

    PubMed

    Ahmed, Wamiq M; Lenz, Dominik; Liu, Jia; Paul Robinson, J; Ghafoor, Arif

    2008-03-01

    High-throughput biological imaging uses automated imaging devices to collect a large number of microscopic images for analysis of biological systems and validation of scientific hypotheses. Efficient manipulation of these datasets for knowledge discovery requires high-performance computational resources, efficient storage, and automated tools for extracting and sharing such knowledge among different research sites. Newly emerging grid technologies provide powerful means for exploiting the full potential of these imaging techniques. Efficient utilization of grid resources requires the development of knowledge-based tools and services that combine domain knowledge with analysis algorithms. In this paper, we first investigate how grid infrastructure can facilitate high-throughput biological imaging research, and present an architecture for providing knowledge-based grid services for this field. We identify two levels of knowledge-based services. The first level provides tools for extracting spatiotemporal knowledge from image sets and the second level provides high-level knowledge management and reasoning services. We then present cellular imaging markup language, an extensible markup language-based language for modeling of biological images and representation of spatiotemporal knowledge. This scheme can be used for spatiotemporal event composition, matching, and automated knowledge extraction and representation for large biological imaging datasets. We demonstrate the expressive power of this formalism by means of different examples and extensive experimental results.

  20. Virtual Observatories, Data Mining, and Astroinformatics

    NASA Astrophysics Data System (ADS)

    Borne, Kirk

    The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential for astronomical discovery is equally large, and so the data-oriented research methods, algorithms, and techniques that are presented here will enable the greatest discovery potential from the ever-growing data and information resources in astronomy.

  1. Educational System Efficiency Improvement Using Knowledge Discovery in Databases

    ERIC Educational Resources Information Center

    Lukaš, Mirko; Leškovic, Darko

    2007-01-01

    This study describes one of possible way of usage ICT in education system. We basically treated educational system like Business Company and develop appropriate model for clustering of student population. Modern educational systems are forced to extract the most necessary and purposeful information from a large amount of available data. Clustering…

  2. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  3. Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data

    NASA Astrophysics Data System (ADS)

    Fard, Amin Milani

    Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.

  4. Systematic identification of latent disease-gene associations from PubMed articles.

    PubMed

    Zhang, Yuji; Shen, Feichen; Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang

    2018-01-01

    Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

  5. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.

    PubMed

    Danchin, Antoine; Ouzounis, Christos; Tokuyasu, Taku; Zucker, Jean-Daniel

    2018-07-01

    Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader. © 2018 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  6. Systematic identification of latent disease-gene associations from PubMed articles

    PubMed Central

    Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang

    2018-01-01

    Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research. PMID:29373609

  7. Affinity Crystallography: A New Approach to Extracting High-Affinity Enzyme Inhibitors from Natural Extracts.

    PubMed

    Aguda, Adeleke H; Lavallee, Vincent; Cheng, Ping; Bott, Tina M; Meimetis, Labros G; Law, Simon; Nguyen, Nham T; Williams, David E; Kaleta, Jadwiga; Villanueva, Ivan; Davies, Julian; Andersen, Raymond J; Brayer, Gary D; Brömme, Dieter

    2016-08-26

    Natural products are an important source of novel drug scaffolds. The highly variable and unpredictable timelines associated with isolating novel compounds and elucidating their structures have led to the demise of exploring natural product extract libraries in drug discovery programs. Here we introduce affinity crystallography as a new methodology that significantly shortens the time of the hit to active structure cycle in bioactive natural product discovery research. This affinity crystallography approach is illustrated by using semipure fractions of an actinomycetes culture extract to isolate and identify a cathepsin K inhibitor and to compare the outcome with the traditional assay-guided purification/structural analysis approach. The traditional approach resulted in the identification of the known inhibitor antipain (1) and its new but lower potency dehydration product 2, while the affinity crystallography approach led to the identification of a new high-affinity inhibitor named lichostatinal (3). The structure and potency of lichostatinal (3) was verified by total synthesis and kinetic characterization. To the best of our knowledge, this is the first example of isolating and characterizing a potent enzyme inhibitor from a partially purified crude natural product extract using a protein crystallographic approach.

  8. Information Fusion - Methods and Aggregation Operators

    NASA Astrophysics Data System (ADS)

    Torra, Vicenç

    Information fusion techniques are commonly applied in Data Mining and Knowledge Discovery. In this chapter, we will give an overview of such applications considering their three main uses. This is, we consider fusion methods for data preprocessing, model building and information extraction. Some aggregation operators (i.e. particular fusion methods) and their properties are briefly described as well.

  9. Requirement of scientific documentation for the development of Naturopathy.

    PubMed

    Rastogi, Rajiv

    2006-01-01

    Past few decades have witnessed explosion of knowledge in almost every field. This has resulted not only in the advancement of the subjects in particular but also have influenced the growth of various allied subjects. The present paper explains about the advancement of science through efforts made in specific areas and also through discoveries in different allied fields having an indirect influence upon the subject in proper. In Naturopathy this seems that though nothing particular is added to the basic thoughts or fundamental principles of the subject yet the entire treatment understanding is revolutionised under the influence of scientific discoveries of past few decades. Advent of information technology has further added to the boom of knowledge and many times this seems impossible to utilize these informations for the good of human being because these are not logically arranged in our minds. In the above background, the author tries to define documentation stating that we have today ocean of information and knowledge about various things- living or dead, plants, animals or human beings; the geographical conditions or changing weather and environment. What required to be done is to extract the relevant knowledge and information required to enrich the subject. The author compares documentation with churning of milk to extract butter. Documentation, in fact, is churning of ocean of information to extract the specific, most appropriate, relevant and defined information and knowledge related to the particular subject . The paper besides discussing the definition of documentation, highlights the areas of Naturopathy requiring an urgent necessity to make proper documentations. Paper also discusses the present status of Naturopathy in India, proposes short-term and long-term goals to be achieved and plans the strategies for achieving them. The most important aspect of the paper is due understanding of the limitations of Naturopathy but a constant effort to improve the same with the growth made in various discipline of science so far.

  10. Using Learning Analytics to Identify Medical Student Misconceptions in an Online Virtual Patient Environment

    ERIC Educational Resources Information Center

    Poitras, Eric G.; Naismith, Laura M.; Doleck, Tenzin; Lajoie, Susanne P.

    2016-01-01

    This study aimed to identify misconceptions in medical student knowledge by mining user interactions in the MedU online learning environment. Data from 13000 attempts at a single virtual patient case were extracted from the MedU MySQL database. A subgroup discovery method was applied to identify patterns in learner-generated annotations and…

  11. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    ERIC Educational Resources Information Center

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  12. Automatic extraction of relations between medical concepts in clinical texts

    PubMed Central

    Harabagiu, Sanda; Roberts, Kirk

    2011-01-01

    Objective A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. Materials and methods A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. Results The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. Discussion Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. Conclusion Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available. PMID:21846787

  13. Isolation, enzyme-bound structure and antibacterial activity of platencin A[subscript 1] from Streptomyces platensis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Sheo B.; Ondeyka, John G.; Herath, Kithsiri B.

    Natural products continue to serve as one of the best sources for discovery of antibacterial agents as exemplified by the recent discoveries of platensimycin and platencin. Chemical modifications as well as discovery of congeners are the main sources for gaining knowledge of structure-activity relationship of natural products. Screening for congeners in the extracts of the fermentation broths of Streptomyces platensis led to the isolation of platencin A{sub 1}, a hydroxy congener of platencin. The hydroxylation of the tricyclic enone moiety negatively affected the antibacterial activity and appears to be consistent with the hydrophobic binding pocket of the FabF. Isolation, structure,more » enzyme-bound structure and activity of platencin A{sub 1} and two other congeners have been described.« less

  14. DataHub: Knowledge-based data management for data discovery

    NASA Astrophysics Data System (ADS)

    Handley, Thomas H.; Li, Y. Philip

    1993-08-01

    Currently available database technology is largely designed for business data-processing applications, and seems inadequate for scientific applications. The research described in this paper, the DataHub, will address the issues associated with this shortfall in technology utilization and development. The DataHub development is addressing the key issues in scientific data management of scientific database models and resource sharing in a geographically distributed, multi-disciplinary, science research environment. Thus, the DataHub will be a server between the data suppliers and data consumers to facilitate data exchanges, to assist science data analysis, and to provide as systematic approach for science data management. More specifically, the DataHub's objectives are to provide support for (1) exploratory data analysis (i.e., data driven analysis); (2) data transformations; (3) data semantics capture and usage; analysis-related knowledge capture and usage; and (5) data discovery, ingestion, and extraction. Applying technologies that vary from deductive databases, semantic data models, data discovery, knowledge representation and inferencing, exploratory data analysis techniques and modern man-machine interfaces, DataHub will provide a prototype, integrated environement to support research scientists' needs in multiple disciplines (i.e. oceanography, geology, and atmospheric) while addressing the more general science data management issues. Additionally, the DataHub will provide data management services to exploratory data analysis applications such as LinkWinds and NCSA's XIMAGE.

  15. ESIP's Earth Science Knowledge Graph (ESKG) Testbed Project: An Automatic Approach to Building Interdisciplinary Earth Science Knowledge Graphs to Improve Data Discovery

    NASA Astrophysics Data System (ADS)

    McGibbney, L. J.; Jiang, Y.; Burgess, A. B.

    2017-12-01

    Big Earth observation data have been produced, archived and made available online, but discovering the right data in a manner that precisely and efficiently satisfies user needs presents a significant challenge to the Earth Science (ES) community. An emerging trend in information retrieval community is to utilize knowledge graphs to assist users in quickly finding desired information from across knowledge sources. This is particularly prevalent within the fields of social media and complex multimodal information processing to name but a few, however building a domain-specific knowledge graph is labour-intensive and hard to keep up-to-date. In this work, we update our progress on the Earth Science Knowledge Graph (ESKG) project; an ESIP-funded testbed project which provides an automatic approach to building a dynamic knowledge graph for ES to improve interdisciplinary data discovery by leveraging implicit, latent existing knowledge present within across several U.S Federal Agencies e.g. NASA, NOAA and USGS. ESKG strengthens ties between observations and user communities by: 1) developing a knowledge graph derived from various sources e.g. Web pages, Web Services, etc. via natural language processing and knowledge extraction techniques; 2) allowing users to traverse, explore, query, reason and navigate ES data via knowledge graph interaction. ESKG has the potential to revolutionize the way in which ES communities interact with ES data in the open world through the entity, spatial and temporal linkages and characteristics that make it up. This project enables the advancement of ESIP collaboration areas including both Discovery and Semantic Technologies by putting graph information right at our fingertips in an interactive, modern manner and reducing the efforts to constructing ontology. To demonstrate the ESKG concept, we will demonstrate use of our framework across NASA JPL's PO.DAAC, NOAA's Earth Observation Requirements Evaluation System (EORES) and various USGS systems.

  16. Building a discovery partnership with Sarawak Biodiversity Centre: a gateway to access natural products from the rainforests.

    PubMed

    Yeo, Tiong Chia; Naming, Margarita; Manurung, Rita

    2014-03-01

    The Sarawak Biodiversity Centre (SBC) is a state government agency which regulates research and promotes the sustainable use of biodiversity. It has a program on documentation of traditional knowledge (TK) and is well-equipped with facilities for natural product research. SBC maintains a Natural Product Library (NPL) consisting of local plant and microbial extracts for bioprospecting. The NPL is a core discovery platform for screening of bioactive compounds by researchers through a formal agreement with clear benefit sharing obligations. SBC aims to develop partnerships with leading institutions and the industries to explore the benefits of biodiversity.

  17. A knowledgebase system to enhance scientific discovery: Telemakus

    PubMed Central

    Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M

    2004-01-01

    Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested. The rationale and system architecture are described and plans for the future are discussed. PMID:15507158

  18. Knowledge Discovery from Vibration Measurements

    PubMed Central

    Li, Jian; Wang, Daoyao

    2014-01-01

    The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques. PMID:24574933

  19. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing

    PubMed Central

    2013-01-01

    Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147

  20. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-02-01

    Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. FIR: An Effective Scheme for Extracting Useful Metadata from Social Media.

    PubMed

    Chen, Long-Sheng; Lin, Zue-Cheng; Chang, Jing-Rong

    2015-11-01

    Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.

  2. High-Throughput Screening Platform for the Discovery of New Immunomodulator Molecules from Natural Product Extract Libraries.

    PubMed

    Pérez Del Palacio, José; Díaz, Caridad; de la Cruz, Mercedes; Annang, Frederick; Martín, Jesús; Pérez-Victoria, Ignacio; González-Menéndez, Víctor; de Pedro, Nuria; Tormo, José R; Algieri, Francesca; Rodriguez-Nogales, Alba; Rodríguez-Cabezas, M Elena; Reyes, Fernando; Genilloud, Olga; Vicente, Francisca; Gálvez, Julio

    2016-07-01

    It is widely accepted that central nervous system inflammation and systemic inflammation play a significant role in the progression of chronic neurodegenerative diseases such as Alzheimer's disease and Parkinson's disease, neurotropic viral infections, stroke, paraneoplastic disorders, traumatic brain injury, and multiple sclerosis. Therefore, it seems reasonable to propose that the use of anti-inflammatory drugs might diminish the cumulative effects of inflammation. Indeed, some epidemiological studies suggest that sustained use of anti-inflammatory drugs may prevent or slow down the progression of neurodegenerative diseases. However, the anti-inflammatory drugs and biologics used clinically have the disadvantage of causing side effects and a high cost of treatment. Alternatively, natural products offer great potential for the identification and development of bioactive lead compounds into drugs for treating inflammatory diseases with an improved safety profile. In this work, we present a validated high-throughput screening approach in 96-well plate format for the discovery of new molecules with anti-inflammatory/immunomodulatory activity. The in vitro models are based on the quantitation of nitrite levels in RAW264.7 murine macrophages and interleukin-8 in Caco-2 cells. We have used this platform in a pilot project to screen a subset of 5976 noncytotoxic crude microbial extracts from the MEDINA microbial natural product collection. To our knowledge, this is the first report on an high-throughput screening of microbial natural product extracts for the discovery of immunomodulators. © 2016 Society for Laboratory Automation and Screening.

  3. Ontology-based content analysis of US patent applications from 2001-2010.

    PubMed

    Weber, Lutz; Böhme, Timo; Irmer, Matthias

    2013-01-01

    Ontology-based semantic text analysis methods allow to automatically extract knowledge relationships and data from text documents. In this review, we have applied these technologies for the systematic analysis of pharmaceutical patents. Hierarchical concepts from the knowledge domains of chemical compounds, diseases and proteins were used to annotate full-text US patent applications that deal with pharmacological activities of chemical compounds and filed in the years 2001-2010. Compounds claimed in these applications have been classified into their respective compound classes to review the distribution of scaffold types or general compound classes such as natural products in a time-dependent manner. Similarly, the target proteins and claimed utility of the compounds have been classified and the most relevant were extracted. The method presented allows the discovery of the main areas of innovation as well as emerging fields of patenting activities - providing a broad statistical basis for competitor analysis and decision-making efforts.

  4. Knowledge Discovery in Spectral Data by Means of Complex Networks

    PubMed Central

    Zanin, Massimiliano; Papo, David; Solís, José Luis González; Espinosa, Juan Carlos Martínez; Frausto-Reyes, Claudio; Anda, Pascual Palomares; Sevilla-Escoboza, Ricardo; Boccaletti, Stefano; Menasalvas, Ernestina; Sousa, Pedro

    2013-01-01

    In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease. PMID:24957895

  5. Knowledge discovery in spectral data by means of complex networks.

    PubMed

    Zanin, Massimiliano; Papo, David; Solís, José Luis González; Espinosa, Juan Carlos Martínez; Frausto-Reyes, Claudio; Anda, Pascual Palomares; Sevilla-Escoboza, Ricardo; Jaimes-Reategui, Rider; Boccaletti, Stefano; Menasalvas, Ernestina; Sousa, Pedro

    2013-03-11

    In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.

  6. Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

    PubMed Central

    Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

    2011-01-01

    A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677

  7. The Analysis of Image Segmentation Hierarchies with a Graph-based Knowledge Discovery System

    NASA Technical Reports Server (NTRS)

    Tilton, James C.; Cooke, diane J.; Ketkar, Nikhil; Aksoy, Selim

    2008-01-01

    Currently available pixel-based analysis techniques do not effectively extract the information content from the increasingly available high spatial resolution remotely sensed imagery data. A general consensus is that object-based image analysis (OBIA) is required to effectively analyze this type of data. OBIA is usually a two-stage process; image segmentation followed by an analysis of the segmented objects. We are exploring an approach to OBIA in which hierarchical image segmentations provided by the Recursive Hierarchical Segmentation (RHSEG) software developed at NASA GSFC are analyzed by the Subdue graph-based knowledge discovery system developed by a team at Washington State University. In this paper we discuss out initial approach to representing the RHSEG-produced hierarchical image segmentations in a graphical form understandable by Subdue, and provide results on real and simulated data. We also discuss planned improvements designed to more effectively and completely convey the hierarchical segmentation information to Subdue and to improve processing efficiency.

  8. Summary of the BioLINK SIG 2013 meeting at ISMB/ECCB 2013.

    PubMed

    Verspoor, Karin; Shatkay, Hagit; Hirschman, Lynette; Blaschke, Christian; Valencia, Alfonso

    2015-01-15

    The ISMB Special Interest Group on Linking Literature, Information and Knowledge for Biology (BioLINK) organized a one-day workshop at ISMB/ECCB 2013 in Berlin, Germany. The theme of the workshop was 'Roles for text mining in biomedical knowledge discovery and translational medicine'. This summary reviews the outcomes of the workshop. Meeting themes included concept annotation methods and applications, extraction of biological relationships and the use of text-mined data for biological data analysis. All articles are available at http://biolinksig.org/proceedings-online/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Extracting nursing practice patterns from structured labor and delivery data sets.

    PubMed

    Hall, Eric S; Thornton, Sidney N

    2007-10-11

    This study was designed to demonstrate the feasibility of a computerized care process model that provides real-time case profiling and outcome forecasting. A methodology was defined for extracting nursing practice patterns from structured point-of-care data collected using the labor and delivery information system at Intermountain Healthcare. Data collected during January 2006 were retrieved from Intermountain Healthcare's enterprise data warehouse for use in the study. The knowledge discovery in databases process provided a framework for data analysis including data selection, preprocessing, data-mining, and evaluation. Development of an interactive data-mining tool and construction of a data model for stratification of patient records into profiles supported the goals of the study. Five benefits of the practice pattern extraction capability, which extend to other clinical domains, are listed with supporting examples.

  10. Review of procedures used for the extraction of anti-cancer compounds from tropical plants.

    PubMed

    Pandey, Saurabh; Shaw, Paul N; Hewavitharana, Amitha K

    2015-01-01

    Tropical plants are important sources of anti-cancer lead molecules. According to the US National Cancer Institute, out of the 3000 plants identified as active against cancer using in vitro studies, 70% are of tropical origin. The extraction of bioactive compounds from the plant materials is a fundamental step whose efficiency is critical for the success of drug discovery efforts. There has been no review published of the extraction procedures of anti-cancer compounds from tropical plants and hence the following is a critical evaluation of such procedures undertaken prior to the use of these compounds in cancer cell line studies, during the last five years. It presents a comprehensive analysis of all approaches taken to extract anti-cancer compounds from various tropical plants. (Databases searched were PubMed, SciFinder, Web of Knowledge, Scopus, Embase and Google Scholar).

  11. Information extraction and knowledge graph construction from geoscience literature

    NASA Astrophysics Data System (ADS)

    Wang, Chengbin; Ma, Xiaogang; Chen, Jianguo; Chen, Jingwen

    2018-03-01

    Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.

  12. The relation between prior knowledge and students' collaborative discovery learning processes

    NASA Astrophysics Data System (ADS)

    Gijlers, Hannie; de Jong, Ton

    2005-03-01

    In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction with the environment was logged. Based on students' individual judgments of the truth-value and testability of a series of domain-specific propositions, a detailed description of the knowledge configuration for each dyad was created before they entered the learning environment. Qualitative analyses of two dialogues illustrated that prior knowledge influences the discovery learning processes, and knowledge development in a pair of students. Assessments of student and dyad definitional (domain-specific) knowledge, generic (mathematical and graph) knowledge, and generic (discovery) skills were related to the students' dialogue in different discovery learning processes. Results show that a high level of definitional prior knowledge is positively related to the proportion of communication regarding the interpretation of results. Heterogeneity with respect to generic prior knowledge was positively related to the number of utterances made in the discovery process categories hypotheses generation and experimentation. Results of the qualitative analyses indicated that collaboration between extremely heterogeneous dyads is difficult when the high achiever is not willing to scaffold information and work in the low achiever's zone of proximal development.

  13. Mass spectrometry analysis of terpene lactones in Ginkgo biloba.

    PubMed

    Ding, Shujing; Dudley, Ed; Song, Qingbao; Plummer, Sue; Tang, Jiandong; Newton, Russell P; Brenton, A Gareth

    2008-01-01

    Terpene lactones are a family of compounds with unique chemical structures, first recognised in an extract of Ginkgo biloba. The discovery of terpene lactone derivatives has recently been reported in more and more plant extracts and even food products. In this study, mass spectrometric characteristics of the standard terpene lactones in Ginkgo biloba were comprehensively studied using both an ion trap and a quadrupole time-of-flight (QTOF) mass spectrometer. The mass spectral fragmentation data from both techniques was compared to obtain the mass spectrometric fragmentation pathways of the terpene lactones with high confidence. The data obtained will facilitate the analysis and identification of terpene lactones in future plant research via the fragmentation knowledge reported here.

  14. Information Fusion for Natural and Man-Made Disasters

    DTIC Science & Technology

    2007-01-31

    comprehensively large, and metaphysically accurate model of situations, through which specific tasks such as situation assessment, knowledge discovery , or the...significance” is always context specific. Event discovery is a very important element of the HLF process, which can lead to knowledge discovery about...expected, given the current state of knowledge . Examples of such behavior may include discovery of a new aggregate or situation, a specific pattern of

  15. A New System To Support Knowledge Discovery: Telemakus.

    ERIC Educational Resources Information Center

    Revere, Debra; Fuller, Sherrilynne S.; Bugni, Paul F.; Martin, George M.

    2003-01-01

    The Telemakus System builds on the areas of concept representation, schema theory, and information visualization to enhance knowledge discovery from scientific literature. This article describes the underlying theories and an overview of a working implementation designed to enhance the knowledge discovery process through retrieval, visual and…

  16. Knowledge Discovery as an Aid to Organizational Creativity.

    ERIC Educational Resources Information Center

    Siau, Keng

    2000-01-01

    This article presents the concept of knowledge discovery, a process of searching for associations in large volumes of computer data, as an aid to creativity. It then discusses the various techniques in knowledge discovery. Mednick's associative theory of creative thought serves as the theoretical foundation for this research. (Contains…

  17. NCI Program for Natural Product Discovery: A Publicly-Accessible Library of Natural Product Fractions for High-Throughput Screening.

    PubMed

    Thornburg, Christopher C; Britt, John R; Evans, Jason R; Akee, Rhone K; Whitt, James A; Trinh, Spencer K; Harris, Matthew J; Thompson, Jerell R; Ewing, Teresa L; Shipley, Suzanne M; Grothaus, Paul G; Newman, David J; Schneider, Joel P; Grkovic, Tanja; O'Keefe, Barry R

    2018-06-13

    The US National Cancer Institute's (NCI) Natural Product Repository is one of the world's largest, most diverse collections of natural products containing over 230,000 unique extracts derived from plant, marine, and microbial organisms that have been collected from biodiverse regions throughout the world. Importantly, this national resource is available to the research community for the screening of extracts and the isolation of bioactive natural products. However, despite the success of natural products in drug discovery, compatibility issues that make extracts challenging for liquid handling systems, extended timelines that complicate natural product-based drug discovery efforts and the presence of pan-assay interfering compounds have reduced enthusiasm for the high-throughput screening (HTS) of crude natural product extract libraries in targeted assay systems. To address these limitations, the NCI Program for Natural Product Discovery (NPNPD), a newly launched, national program to advance natural product discovery technologies and facilitate the discovery of structurally defined, validated lead molecules ready for translation will create a prefractionated library from over 125,000 natural product extracts with the aim of producing a publicly-accessible, HTS-amenable library of >1,000,000 fractions. This library, representing perhaps the largest accumulation of natural-product based fractions in the world, will be made available free of charge in 384-well plates for screening against all disease states in an effort to reinvigorate natural product-based drug discovery.

  18. Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference, PAKDD 2017 Held in Jeju, South Korea, May 23 26, 2017. Proceedings Part I, Part II.

    DTIC Science & Technology

    2017-06-27

    From - To) 05-27-2017 Final 17-03-2017 - 15-03-2018 4. TITLE AND SUBTITLE Sa. CONTRACT NUMBER FA2386-17-1-0102 Advances in Knowledge Discovery and...Springer; Switzerland. 14. ABSTRACT The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is a leading international conference...in the areas of knowledge discovery and data mining (KDD). We had three keynote speeches, delivered by Sang Cha from Seoul National University

  19. Cutting Silica Aerogel for Particle Extraction

    NASA Technical Reports Server (NTRS)

    Tsou, P.; Brownlee, D. E.; Glesias, R.; Grigoropoulos, C. P.; Weschler, M.

    2005-01-01

    The detailed laboratory analyses of extraterrestrial particles have revolutionized our knowledge of planetary bodies in the last three decades. This knowledge of chemical composition, morphology, mineralogy, and isotopics of particles cannot be provided by remote sensing. In order to acquire these detail information in the laboratories, the samples need be intact, unmelted. Such intact capture of hypervelocity particles has been developed in 1996. Subsequently silica aerogel was introduced as the preferred medium for intact capturing of hypervelocity particles and later showed it to be particularly suitable for the space environment. STARDUST, the 4th NASA Discovery mission to capture samples from 81P/Wild 2 and contemporary interstellar dust, is the culmination of these new technologies. In early laboratory experiments of launching hypervelocity projectiles into aerogel, there was the need to cut aerogel to isolate or extract captured particles/tracks. This is especially challenging for space captures, since there will be many particles/tracks of wide ranging scales closely located, even collocated. It is critical to isolate and extract one particle without compromising its neighbors since the full significance of a particle is not known until it is extracted and analyzed. To date, three basic techniques have been explored: mechanical cutting, lasers cutting and ion beam milling. We report the current findings.

  20. Biomedical discovery acceleration, with applications to craniofacial development.

    PubMed

    Leach, Sonia M; Tipney, Hannah; Feng, Weiguo; Baumgartner, William A; Kasliwal, Priyanka; Schuyler, Ronald P; Williams, Trevor; Spritz, Richard A; Hunter, Lawrence

    2009-03-01

    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.

  1. An Overview of Biomolecular Event Extraction from Scientific Documents

    PubMed Central

    Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.

    2015-01-01

    This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051

  2. New approaches to antimicrobial discovery.

    PubMed

    Lewis, Kim

    2017-06-15

    The spread of resistant organisms is producing a human health crisis, as we are witnessing the emergence of pathogens resistant to all available antibiotics. An increase in chronic infections presents an additional challenge - these diseases are difficult to treat due to antibiotic-tolerant persister cells. Overmining of soil Actinomycetes ended the golden era of antibiotic discovery in the 60s, and efforts to replace this source by screening synthetic compound libraries was not successful. Bacteria have an efficient permeability barrier, preventing penetration of most synthetic compounds. Empirically establishing rules of penetration for antimicrobials will form the knowledge base to produce libraries tailored to antibiotic discovery, and will revive rational drug design. Two untapped sources of natural products hold the promise of reviving natural product discovery. Most bacterial species, over 99%, are uncultured, and methods to grow these organisms have been developed, and the first promising compounds are in development. Genome sequencing shows that known producers harbor many more operons coding for secondary metabolites than we can account for, providing an additional rich source of antibiotics. Revival of natural product discovery will require high-throughput identification of novel compounds within a large background of known substances. This could be achieved by rapid acquisition of transcription profiles from active extracts that will point to potentially novel compounds. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. The Relation between Prior Knowledge and Students' Collaborative Discovery Learning Processes

    ERIC Educational Resources Information Center

    Gijlers, Hannie; de Jong, Ton

    2005-01-01

    In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction…

  4. Knowledge discovery of drug data on the example of adverse reaction prediction

    PubMed Central

    2014-01-01

    Background Antibiotics are the widely prescribed drugs for children and most likely to be related with adverse reactions. Record on adverse reactions and allergies from antibiotics considerably affect the prescription choices. We consider this a biomedical decision-making problem and explore hidden knowledge in survey results on data extracted from a big data pool of health records of children, from the Health Center of Osijek, Eastern Croatia. Results We applied and evaluated a k-means algorithm to the dataset to generate some clusters which have similar features. Our results highlight that some type of antibiotics form different clusters, which insight is most helpful for the clinician to support better decision-making. Conclusions Medical professionals can investigate the clusters which our study revealed, thus gaining useful knowledge and insight into this data for their clinical studies. PMID:25079450

  5. Phytochemical Constituents and Antimicrobial Activity of the Ethanol and Chloroform Crude Leaf Extracts of Spathiphyllum cannifolium (Dryand. ex Sims) Schott.

    PubMed

    Dhayalan, Arunachalam; Gracilla, Daniel E; Dela Peña, Renato A; Malison, Marilyn T; Pangilinan, Christian R

    2018-01-01

    The study investigated the medicinal properties of Spathiphyllum cannifolium (Dryand. ex Sims) Schott as a possible source of antimicrobial compounds. The phytochemical constituents were screened using qualitative methods and the antibacterial and antifungal activities were determined using agar well diffusion method. One-way analysis of variance and Fisher's least significant difference test were used. The phytochemical screening showed the presence of sterols, flavonoids, alkaloids, saponins, glycosides, and tannins in both ethanol and chloroform leaf extracts, but triterpenes were detected only in the ethanol leaf extract. The antimicrobial assay revealed that the chloroform leaf extract inhibited Candida albicans, Escherichia coli, Staphylococcus aureus, Bacillus subtilis, and Pseudomonas aeruginosa , whereas the ethanol leaf extract inhibited E. coli , S. aureus , and B. subtilis only. The ethanol and chloroform leaf extracts exhibited the highest zone of inhibition against B. subtilis . The antifungal assay showed that both the leaf extracts have no bioactivity against Aspergillus niger and C. albicans . Results suggest that chloroform is the better solvent for the extraction of antimicrobial compounds against the test organisms used in this study. Findings of this research will add new knowledge in advancing drug discovery and development in the Philippines.

  6. Mining Hierarchies and Similarity Clusters from Value Set Repositories.

    PubMed

    Peterson, Kevin J; Jiang, Guoqian; Brue, Scott M; Shen, Feichen; Liu, Hongfang

    2017-01-01

    A value set is a collection of permissible values used to describe a specific conceptual domain for a given purpose. By helping to establish a shared semantic understanding across use cases, these artifacts are important enablers of interoperability and data standardization. As the size of repositories cataloging these value sets expand, knowledge management challenges become more pronounced. Specifically, discovering value sets applicable to a given use case may be challenging in a large repository. In this study, we describe methods to extract implicit relationships between value sets, and utilize these relationships to overlay organizational structure onto value set repositories. We successfully extract two different structurings, hierarchy and clustering, and show how tooling can leverage these structures to enable more effective value set discovery.

  7. Knowledge Discovery for Smart Grid Operation, Control, and Situation Awareness -- A Big Data Visualization Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gu, Yi; Jiang, Huaiguang; Zhang, Yingchen

    In this paper, a big data visualization platform is designed to discover the hidden useful knowledge for smart grid (SG) operation, control and situation awareness. The spawn of smart sensors at both grid side and customer side can provide large volume of heterogeneous data that collect information in all time spectrums. Extracting useful knowledge from this big-data poll is still challenging. In this paper, the Apache Spark, an open source cluster computing framework, is used to process the big-data to effectively discover the hidden knowledge. A high-speed communication architecture utilizing the Open System Interconnection (OSI) model is designed to transmitmore » the data to a visualization platform. This visualization platform uses Google Earth, a global geographic information system (GIS) to link the geological information with the SG knowledge and visualize the information in user defined fashion. The University of Denver's campus grid is used as a SG test bench and several demonstrations are presented for the proposed platform.« less

  8. SureChEMBL: a large-scale, chemically annotated patent document database.

    PubMed

    Papadatos, George; Davies, Mark; Dedman, Nathan; Chambers, Jon; Gaulton, Anna; Siddle, James; Koks, Richard; Irvine, Sean A; Pettersson, Joe; Goncharoff, Nicko; Hersey, Anne; Overington, John P

    2016-01-04

    SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. SureChEMBL: a large-scale, chemically annotated patent document database

    PubMed Central

    Papadatos, George; Davies, Mark; Dedman, Nathan; Chambers, Jon; Gaulton, Anna; Siddle, James; Koks, Richard; Irvine, Sean A.; Pettersson, Joe; Goncharoff, Nicko; Hersey, Anne; Overington, John P.

    2016-01-01

    SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. PMID:26582922

  10. A New Student Performance Analysing System Using Knowledge Discovery in Higher Educational Databases

    ERIC Educational Resources Information Center

    Guruler, Huseyin; Istanbullu, Ayhan; Karahasan, Mehmet

    2010-01-01

    Knowledge discovery is a wide ranged process including data mining, which is used to find out meaningful and useful patterns in large amounts of data. In order to explore the factors having impact on the success of university students, knowledge discovery software, called MUSKUP, has been developed and tested on student data. In this system a…

  11. On prediction and discovery of lunar ores

    NASA Technical Reports Server (NTRS)

    Haskin, Larry A.; Colson, Russell O.; Vaniman, David

    1991-01-01

    Sampling of lunar material and remote geochemical, mineralogical, and photogeologic sensing of the lunar surface, while meager, provide first-cut information about lunar composition and geochemical separation processes. Knowledge of elemental abundances in known lunar materials indicates which common lunar materials might serve as ores if there is economic demand and if economical extraction processes can be developed, remote sensing can be used to extend the understanding of the Moon's major geochemical separations and to locate potential ore bodies. Observed geochemical processes might lead to ores of less abundant elements under extreme local conditions.

  12. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…

  13. Perceptual learning modules in mathematics: enhancing students' pattern recognition, structure extraction, and fluency.

    PubMed

    Kellman, Philip J; Massey, Christine M; Son, Ji Y

    2010-04-01

    Learning in educational settings emphasizes declarative and procedural knowledge. Studies of expertise, however, point to other crucial components of learning, especially improvements produced by experience in the extraction of information: perceptual learning (PL). We suggest that such improvements characterize both simple sensory and complex cognitive, even symbolic, tasks through common processes of discovery and selection. We apply these ideas in the form of perceptual learning modules (PLMs) to mathematics learning. We tested three PLMs, each emphasizing different aspects of complex task performance, in middle and high school mathematics. In the MultiRep PLM, practice in matching function information across multiple representations improved students' abilities to generate correct graphs and equations from word problems. In the Algebraic Transformations PLM, practice in seeing equation structure across transformations (but not solving equations) led to dramatic improvements in the speed of equation solving. In the Linear Measurement PLM, interactive trials involving extraction of information about units and lengths produced successful transfer to novel measurement problems and fraction problem solving. Taken together, these results suggest (a) that PL techniques have the potential to address crucial, neglected dimensions of learning, including discovery and fluent processing of relations; (b) PL effects apply even to complex tasks that involve symbolic processing; and (c) appropriately designed PL technology can produce rapid and enduring advances in learning. Copyright © 2009 Cognitive Science Society, Inc.

  14. Text mining for traditional Chinese medical knowledge discovery: a survey.

    PubMed

    Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan

    2010-08-01

    Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.

  15. FERN Ethnomedicinal Plant Database: Exploring Fern Ethnomedicinal Plants Knowledge for Computational Drug Discovery.

    PubMed

    Thakar, Sambhaji B; Ghorpade, Pradnya N; Kale, Manisha V; Sonawane, Kailas D

    2015-01-01

    Fern plants are known for their ethnomedicinal applications. Huge amount of fern medicinal plants information is scattered in the form of text. Hence, database development would be an appropriate endeavor to cope with the situation. So by looking at the importance of medicinally useful fern plants, we developed a web based database which contains information about several group of ferns, their medicinal uses, chemical constituents as well as protein/enzyme sequences isolated from different fern plants. Fern ethnomedicinal plant database is an all-embracing, content management web-based database system, used to retrieve collection of factual knowledge related to the ethnomedicinal fern species. Most of the protein/enzyme sequences have been extracted from NCBI Protein sequence database. The fern species, family name, identification, taxonomy ID from NCBI, geographical occurrence, trial for, plant parts used, ethnomedicinal importance, morphological characteristics, collected from various scientific literatures and journals available in the text form. NCBI's BLAST, InterPro, phylogeny, Clustal W web source has also been provided for the future comparative studies. So users can get information related to fern plants and their medicinal applications at one place. This Fern ethnomedicinal plant database includes information of 100 fern medicinal species. This web based database would be an advantageous to derive information specifically for computational drug discovery, botanists or botanical interested persons, pharmacologists, researchers, biochemists, plant biotechnologists, ayurvedic practitioners, doctors/pharmacists, traditional medicinal users, farmers, agricultural students and teachers from universities as well as colleges and finally fern plant lovers. This effort would be useful to provide essential knowledge for the users about the adventitious applications for drug discovery, applications, conservation of fern species around the world and finally to create social awareness.

  16. Chemical Informatics and the Drug Discovery Knowledge Pyramid

    PubMed Central

    Lushington, Gerald H.; Dong, Yinghua; Theertham, Bhargav

    2012-01-01

    The magnitude of the challenges in preclinical drug discovery is evident in the large amount of capital invested in such efforts in pursuit of a small static number of eventually successful marketable therapeutics. An explosion in the availability of potentially drug-like compounds and chemical biology data on these molecules can provide us with the means to improve the eventual success rates for compounds being considered at the preclinical level, but only if the community is able to access available information in an efficient and meaningful way. Thus, chemical database resources are critical to any serious drug discovery effort. This paper explores the basic principles underlying the development and implementation of chemical databases, and examines key issues of how molecular information may be encoded within these databases so as to enhance the likelihood that users will be able to extract meaningful information from data queries. In addition to a broad survey of conventional data representation and query strategies, key enabling technologies such as new context-sensitive chemical similarity measures and chemical cartridges are examined, with recommendations on how such resources may be integrated into a practical database environment. PMID:23782037

  17. A Knowledge Discovery framework for Planetary Defense

    NASA Astrophysics Data System (ADS)

    Jiang, Y.; Yang, C. P.; Li, Y.; Yu, M.; Bambacus, M.; Seery, B.; Barbee, B.

    2016-12-01

    Planetary Defense, a project funded by NASA Goddard and the NSF, is a multi-faceted effort focused on the mitigation of Near Earth Object (NEO) threats to our planet. Currently, there exists a dispersion of information concerning NEO's amongst different organizations and scientists, leading to a lack of a coherent system of information to be used for efficient NEO mitigation. In this paper, a planetary defense knowledge discovery engine is proposed to better assist the development and integration of a NEO responding system. Specifically, we have implemented an organized information framework by two means: 1) the development of a semantic knowledge base, which provides a structure for relevant information. It has been developed by the implementation of web crawling and natural language processing techniques, which allows us to collect and store the most relevant structured information on a regular basis. 2) the development of a knowledge discovery engine, which allows for the efficient retrieval of information from our knowledge base. The knowledge discovery engine has been built on the top of Elasticsearch, an open source full-text search engine, as well as cutting-edge machine learning ranking and recommendation algorithms. This proposed framework is expected to advance the knowledge discovery and innovation in planetary science domain.

  18. Translational Research 2.0: a framework for accelerating collaborative discovery.

    PubMed

    Asakiewicz, Chris

    2014-05-01

    The world wide web has revolutionized the conduct of global, cross-disciplinary research. In the life sciences, interdisciplinary approaches to problem solving and collaboration are becoming increasingly important in facilitating knowledge discovery and integration. Web 2.0 technologies promise to have a profound impact - enabling reproducibility, aiding in discovery, and accelerating and transforming medical and healthcare research across the healthcare ecosystem. However, knowledge integration and discovery require a consistent foundation upon which to operate. A foundation should be capable of addressing some of the critical issues associated with how research is conducted within the ecosystem today and how it should be conducted for the future. This article will discuss a framework for enhancing collaborative knowledge discovery across the medical and healthcare research ecosystem. A framework that could serve as a foundation upon which ecosystem stakeholders can enhance the way data, information and knowledge is created, shared and used to accelerate the translation of knowledge from one area of the ecosystem to another.

  19. Finding complex biological relationships in recent PubMed articles using Bio-LDA.

    PubMed

    Wang, Huijun; Ding, Ying; Tang, Jie; Dong, Xiao; He, Bing; Qiu, Judy; Wild, David J

    2011-03-23

    The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, but also provides a useful resource for the discovery of recent information concerning genes, diseases, compounds and the interactions between them. In this paper, we describe an algorithm called Bio-LDA that uses extracted biological terminology to automatically identify latent topics, and provides a variety of measures to uncover putative relations among topics and bio-terms. Relationships identified using those approaches are combined with existing data in life science datasets to provide additional insight. Three case studies demonstrate the utility of the Bio-LDA model, including association predication, association search and connectivity map generation. This combined approach offers new opportunities for knowledge discovery in many areas of biology including target identification, lead hopping and drug repurposing.

  20. Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA

    PubMed Central

    Wang, Huijun; Ding, Ying; Tang, Jie; Dong, Xiao; He, Bing; Qiu, Judy; Wild, David J.

    2011-01-01

    The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, but also provides a useful resource for the discovery of recent information concerning genes, diseases, compounds and the interactions between them. In this paper, we describe an algorithm called Bio-LDA that uses extracted biological terminology to automatically identify latent topics, and provides a variety of measures to uncover putative relations among topics and bio-terms. Relationships identified using those approaches are combined with existing data in life science datasets to provide additional insight. Three case studies demonstrate the utility of the Bio-LDA model, including association predication, association search and connectivity map generation. This combined approach offers new opportunities for knowledge discovery in many areas of biology including target identification, lead hopping and drug repurposing. PMID:21448266

  1. Conceptualization of an R&D Based Learning-to-Innovate Model for Science Education

    NASA Astrophysics Data System (ADS)

    Lai, Oiki Sylvia

    The purpose of this research was to conceptualize an R & D based learning-to-innovate (LTI) model. The problem to be addressed was the lack of a theoretical L TI model, which would inform science pedagogy. The absorptive capacity (ACAP) lens was adopted to untangle the R & D LTI phenomenon into four learning processes: problem-solving via knowledge acquisition, incremental improvement via knowledge participation, scientific discovery via knowledge creation, and product design via knowledge productivity. The four knowledge factors were the latent factors and each factor had seven manifest elements as measured variables. The key objectives of the non experimental quantitative survey were to measure the relative importance of the identified elements and to explore the underlining structure of the variables. A questionnaire had been prepared, and was administered to more than 155 R & D professionals from four sectors - business, academic, government, and nonprofit. The results showed that every identified element was important to the R & D professionals, in terms of improving the related type of innovation. The most important elements were highlighted to serve as building blocks for elaboration. In search for patterns of the data matrix, exploratory factor analysis (EF A) was performed. Principal component analysis was the first phase of EF A to extract factors; while maximum likelihood estimation (MLE) was used to estimate the model. EF A yielded the finding of two aspects in each kind of knowledge. Logical names were assigned to represent the nature of the subsets: problem and knowledge under knowledge acquisition, planning and participation under knowledge participation, exploration and discovery under knowledge creation, and construction and invention under knowledge productivity. These two constructs, within each kind of knowledge, added structure to the vague R & D based LTI model. The research questions and hypotheses testing were addressed using correlation analysis. The alternative hypotheses that there were positive relationships between knowledge factors and their corresponding types of innovation were accepted. In-depth study of each process is recommended in both research and application. Experimental tests are needed, in order to ultimately present the LTI model to enhance the scientific knowledge absorptive capacity of the learners to facilitate their innovation performance.

  2. Knowledge Discovery from Biomedical Ontologies in Cross Domains.

    PubMed

    Shen, Feichen; Lee, Yugyung

    2016-01-01

    In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.

  3. Knowledge Discovery from Biomedical Ontologies in Cross Domains

    PubMed Central

    Shen, Feichen; Lee, Yugyung

    2016-01-01

    In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies. PMID:27548262

  4. Knowledge discovery with classification rules in a cardiovascular dataset.

    PubMed

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  5. Progress in Biomedical Knowledge Discovery: A 25-year Retrospective

    PubMed Central

    Sacchi, L.

    2016-01-01

    Summary Objectives We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. Methods We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. Results A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Conclusions Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data. PMID:27488403

  6. Progress in Biomedical Knowledge Discovery: A 25-year Retrospective.

    PubMed

    Sacchi, L; Holmes, J H

    2016-08-02

    We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992- 2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.

  7. Communication in Collaborative Discovery Learning

    ERIC Educational Resources Information Center

    Saab, Nadira; van Joolingen, Wouter R.; van Hout-Wolters, Bernadette H. A. M.

    2005-01-01

    Background: Constructivist approaches to learning focus on learning environments in which students have the opportunity to construct knowledge themselves, and negotiate this knowledge with others. "Discovery learning" and "collaborative learning" are examples of learning contexts that cater for knowledge construction processes. We introduce a…

  8. Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework

    PubMed Central

    Lucero, Robert J.; Bakken, Suzanne

    2014-01-01

    Electronic health information systems can increase the ability of health-care organizations to investigate the effects of clinical interventions. The authors present an organizing framework that integrates outcomes and informatics research paradigms to guide knowledge discovery in electronic clinical databases. They illustrate its application using the example of hospital acquired pressure ulcers (HAPU). The Knowledge Discovery through Informatics for Comparative Effectiveness Research (KDI-CER) framework was conceived as a heuristic to conceptualize study designs and address potential methodological limitations imposed by using a single research perspective. Advances in informatics research can play a complementary role in advancing the field of outcomes research including CER. The KDI-CER framework can be used to facilitate knowledge discovery from routinely collected electronic clinical data. PMID:25278645

  9. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system.

    PubMed

    Tudor, Catalina O; Ross, Karen E; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N

    2015-01-01

    Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases. © The Author(s) 2015. Published by Oxford University Press.

  10. Ethnobotanical perspective of antimalarial plants: traditional knowledge based study.

    PubMed

    Qayum, Abdul; Arya, Rakesh; Lynn, Andrew M

    2016-02-04

    Considering the demand of antimalarial plants it has become essential to find and locate them for their optimal extraction. The work aims to find plants with antimalarial activities which were used by the local people; to raise the value of traditional knowledge system (TKS) prevalent in the study region; to compile characteristics of local plants used in malaria treatment (referred as antimalarial plants) and to have its spatial distribution analysis to establish a concept of geographical health. Antimalarial plants are listed based on literature survey and field data collected during rainy season, from 85 respondents comprised of different ethnic groups. Ethno-medicinal utilities of plants was extracted; botanical name, family, local name, part used, folklore, geographical location and image of plants were recorded after cross validating with existing literatures. The interview was trifurcated in field, Vaidya/Hakims and house to house. Graphical analysis was done for major plants families, plant part used, response of people and patients and folklore. Mathematical analysis was done for interviewee's response, methods of plant identification and people's preferences of TKS through three plant indices. Fifty-one plants belonging to 27 families were reported with its geographical attributes. It is found plant root (31.75 %) is used mostly for malaria treatment and administration mode is decoction (41.2 %) mainly. The study area has dominance of plants of family Fabaceae (7), Asteraceae (4), Acanthaceae (4) and Amaranthaceae (4). Most popular plants found are Adhatoda vasica, Cassia fistula and Swertia chirata while  % usage of TKS is 82.0 % for malaria cure. The research findings can be used by both scientific community and common rural people for bio-discovery of these natural resources sustainably. The former can extract the tables to obtain a suitable plant towards finding a suitable lead molecule in a drug discovery project; while the latter can meet their local demands of malaria, scientifically.

  11. Mississippi State University Center for Air Sea Technology. FY93 and FY 94 Research Program in Navy Ocean Modeling and Prediction

    DTIC Science & Technology

    1994-09-30

    relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object

  12. Reducing the Bottleneck in Discovery of Novel Antibiotics.

    PubMed

    Jones, Marcus B; Nierman, William C; Shan, Yue; Frank, Bryan C; Spoering, Amy; Ling, Losee; Peoples, Aaron; Zullo, Ashley; Lewis, Kim; Nelson, Karen E

    2017-04-01

    Most antibiotics were discovered by screening soil actinomycetes, but the efficiency of the discovery platform collapsed in the 1960s. By now, more than 3000 antibiotics have been described and most of the current discovery effort is focused on the rediscovery of known compounds, making the approach impractical. The last marketed broad-spectrum antibiotics discovered were daptomycin, linezolid, and fidaxomicin. The current state of the art in the development of new anti-infectives is a non-existent pipeline in the absence of a discovery platform. This is particularly troubling given the emergence of pan-resistant pathogens. The current practice in dealing with the problem of the background of known compounds is to use chemical dereplication of extracts to assess the relative novelty of a compound it contains. Dereplication typically requires scale-up, extraction, and often fractionation before an accurate mass and structure can be produced by MS analysis in combination with 2D NMR. Here, we describe a transcriptome analysis approach using RNA sequencing (RNASeq) to identify promising novel antimicrobial compounds from microbial extracts. Our pipeline permits identification of antimicrobial compounds that produce distinct transcription profiles using unfractionated cell extracts. This efficient pipeline will eliminate the requirement for purification and structure determination of compounds from extracts and will facilitate high-throughput screen of cell extracts for identification of novel compounds.

  13. Joint principal trend analysis for longitudinal high-dimensional data.

    PubMed

    Zhang, Yuping; Ouyang, Zhengqing

    2018-06-01

    We consider a research scenario motivated by integrating multiple sources of information for better knowledge discovery in diverse dynamic biological processes. Given two longitudinal high-dimensional datasets for a group of subjects, we want to extract shared latent trends and identify relevant features. To solve this problem, we present a new statistical method named as joint principal trend analysis (JPTA). We demonstrate the utility of JPTA through simulations and applications to gene expression data of the mammalian cell cycle and longitudinal transcriptional profiling data in response to influenza viral infections. © 2017, The International Biometric Society.

  14. Calling on a million minds for community annotation in WikiProteins

    PubMed Central

    Mons, Barend; Ashburner, Michael; Chichester, Christine; van Mulligen, Erik; Weeber, Marc; den Dunnen, Johan; van Ommen, Gert-Jan; Musen, Mark; Cockerill, Matthew; Hermjakob, Henning; Mons, Albert; Packer, Abel; Pacheco, Roberto; Lewis, Suzanna; Berkeley, Alfred; Melton, William; Barris, Nickolas; Wales, Jimmy; Meijssen, Gerard; Moeller, Erik; Roes, Peter Jan; Borner, Katy; Bairoch, Amos

    2008-01-01

    WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at . PMID:18507872

  15. Interactive knowledge discovery with the doctor-in-the-loop: a practical example of cerebral aneurysms research.

    PubMed

    Girardi, Dominic; Küng, Josef; Kleiser, Raimund; Sonnberger, Michael; Csillag, Doris; Trenkler, Johannes; Holzinger, Andreas

    2016-09-01

    Established process models for knowledge discovery find the domain-expert in a customer-like and supervising role. In the field of biomedical research, it is necessary to move the domain-experts into the center of this process with far-reaching consequences for both their research output and the process itself. In this paper, we revise the established process models for knowledge discovery and propose a new process model for domain-expert-driven interactive knowledge discovery. Furthermore, we present a research infrastructure which is adapted to this new process model and demonstrate how the domain-expert can be deeply integrated even into the highly complex data-mining process and data-exploration tasks. We evaluated this approach in the medical domain for the case of cerebral aneurysms research.

  16. KnowEnG: a knowledge engine for genomics.

    PubMed

    Sinha, Saurabh; Song, Jun; Weinshilboum, Richard; Jongeneel, Victor; Han, Jiawei

    2015-11-01

    We describe here the vision, motivations, and research plans of the National Institutes of Health Center for Excellence in Big Data Computing at the University of Illinois, Urbana-Champaign. The Center is organized around the construction of "Knowledge Engine for Genomics" (KnowEnG), an E-science framework for genomics where biomedical scientists will have access to powerful methods of data mining, network mining, and machine learning to extract knowledge out of genomics data. The scientist will come to KnowEnG with their own data sets in the form of spreadsheets and ask KnowEnG to analyze those data sets in the light of a massive knowledge base of community data sets called the "Knowledge Network" that will be at the heart of the system. The Center is undertaking discovery projects aimed at testing the utility of KnowEnG for transforming big data to knowledge. These projects span a broad range of biological enquiry, from pharmacogenomics (in collaboration with Mayo Clinic) to transcriptomics of human behavior. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    PubMed Central

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  18. Building Knowledge Graphs for NASA's Earth Science Enterprise

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, T. J.; Ramachandran, R.; Shi, R.; Bao, Q.; Gatlin, P. N.; Weigel, A. M.; Maskey, M.; Miller, J. J.

    2016-12-01

    Inspired by Google Knowledge Graph, we have been building a prototype Knowledge Graph for Earth scientists, connecting information and data in NASA's Earth science enterprise. Our primary goal is to advance the state-of-the-art NASA knowledge extraction capability by going beyond traditional catalog search and linking different distributed information (such as data, publications, services, tools and people). This will enable a more efficient pathway to knowledge discovery. While Google Knowledge Graph provides impressive semantic-search and aggregation capabilities, it is limited to search topics for general public. We use the similar knowledge graph approach to semantically link information gathered from a wide variety of sources within the NASA Earth Science enterprise. Our prototype serves as a proof of concept on the viability of building an operational "knowledge base" system for NASA Earth science. Information is pulled from structured sources (such as NASA CMR catalog, GCMD, and Climate and Forecast Conventions) and unstructured sources (such as research papers). Leveraging modern techniques of machine learning, information retrieval, and deep learning, we provide an integrated data mining and information discovery environment to help Earth scientists to use the best data, tools, methodologies, and models available to answer a hypothesis. Our knowledge graph would be able to answer questions like: Which articles discuss topics investigating similar hypotheses? How have these methods been tested for accuracy? Which approaches have been highly cited within the scientific community? What variables were used for this method and what datasets were used to represent them? What processing was necessary to use this data? These questions then lead researchers and citizen scientists to investigate the sources where data can be found, available user guides, information on how the data was acquired, and available tools and models to use with this data. As a proof of concept, we focus on a well-defined domain - Hurricane Science linking research articles and their findings, data, people and tools/services. Modern information retrieval, natural language processing machine learning and deep learning techniques are applied to build the knowledge network.

  19. Building Faculty Capacity through the Learning Sciences

    ERIC Educational Resources Information Center

    Moy, Elizabeth; O'Sullivan, Gerard; Terlecki, Melissa; Jernstedt, Christian

    2014-01-01

    Discoveries in the learning sciences (especially in neuroscience) have yielded a rich and growing body of knowledge about how students learn, yet this knowledge is only half of the story. The other half is "know how," i.e. the application of this knowledge. For faculty members, that means applying the discoveries of the learning sciences…

  20. The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data.

    PubMed

    Margolis, Ronald; Derr, Leslie; Dunn, Michelle; Huerta, Michael; Larkin, Jennie; Sheehan, Jerry; Guyer, Mark; Green, Eric D

    2014-01-01

    Biomedical research has and will continue to generate large amounts of data (termed 'big data') in many formats and at all levels. Consequently, there is an increasing need to better understand and mine the data to further knowledge and foster new discovery. The National Institutes of Health (NIH) has initiated a Big Data to Knowledge (BD2K) initiative to maximize the use of biomedical big data. BD2K seeks to better define how to extract value from the data, both for the individual investigator and the overall research community, create the analytic tools needed to enhance utility of the data, provide the next generation of trained personnel, and develop data science concepts and tools that can be made available to all stakeholders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  1. The center for causal discovery of biomedical knowledge from big data

    PubMed Central

    Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard

    2015-01-01

    The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. PMID:26138794

  2. The Effect of Rules and Discovery in the Retention and Retrieval of Braille Inkprint Letter Pairs.

    ERIC Educational Resources Information Center

    Nagengast, Daniel L.; And Others

    The effects of rule knowledge were investigated using Braille inkprint pairs. Both recognition and recall were studied in three groups of subjects: rule knowledge, rule discovery, and no rule. Two hypotheses were tested: (1) that the group exposed to the rule would score better than would a discovery group and a control group; and (2) that all…

  3. Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

    PubMed

    Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

    Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.

  4. An Approach to Data Center-Based KDD of Remote Sensing Datasets

    NASA Technical Reports Server (NTRS)

    Lynnes, Christopher; Mack, Robert; Wharton, Stephen W. (Technical Monitor)

    2001-01-01

    The data explosion in remote sensing is straining the ability of data centers to deliver the data to the user community, yet many large-volume users actually seek a relatively small information component within the data, which they extract at their sites using Knowledge Discovery in Databases (KDD) techniques. To improve the efficiency of this process, the Goddard Earth Sciences Distributed Active Archive Center (GES DAAC) has implemented a KDD subsystem that supports execution of the user's KDD algorithm at the data center, dramatically reducing the volume that is sent to the user. The data are extracted from the archive in a planned, organized "campaign"; the algorithms are executed, and the output products sent to the users over the network. The first campaign, now complete, has resulted in overall reductions in shipped volume from 3.3 TB to 0.4 TB.

  5. Exploring patterns of epigenetic information with data mining techniques.

    PubMed

    Aguiar-Pulido, Vanessa; Seoane, José A; Gestal, Marcos; Dorado, Julián

    2013-01-01

    Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.

  6. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    PubMed Central

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi

    2018-01-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625

  7. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.

    PubMed

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi

    2018-02-27

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  8. Research Dilemmas with Behavioral Big Data.

    PubMed

    Shmueli, Galit

    2017-06-01

    Behavioral big data (BBD) refers to very large and rich multidimensional data sets on human and social behaviors, actions, and interactions, which have become available to companies, governments, and researchers. A growing number of researchers in social science and management fields acquire and analyze BBD for the purpose of extracting knowledge and scientific discoveries. However, the relationships between the researcher, data, subjects, and research questions differ in the BBD context compared to traditional behavioral data. Behavioral researchers using BBD face not only methodological and technical challenges but also ethical and moral dilemmas. In this article, we discuss several dilemmas, challenges, and trade-offs related to acquiring and analyzing BBD for causal behavioral research.

  9. Concept Formation in Scientific Knowledge Discovery from a Constructivist View

    NASA Astrophysics Data System (ADS)

    Peng, Wei; Gero, John S.

    The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called “first-person” knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer’s first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.

  10. Concept Formation in Scientific Knowledge Discovery from a Constructivist View

    NASA Astrophysics Data System (ADS)

    Peng, Wei; Gero, John S.

    The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called "first-person" knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer's first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.

  11. 12 CFR 263.53 - Discovery depositions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 12 Banks and Banking 4 2014-01-01 2014-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...

  12. 12 CFR 263.53 - Discovery depositions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 4 2012-01-01 2012-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...

  13. An integrative model for in-silico clinical-genomics discovery science.

    PubMed

    Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael

    2002-01-01

    Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.

  14. Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials

    PubMed Central

    Federer, Callie; Yoo, Minjae

    2016-01-01

    Abstract Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov (https://clinicaltrials.gov/), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov. Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs. PMID:27631620

  15. Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials.

    PubMed

    Federer, Callie; Yoo, Minjae; Tan, Aik Choon

    2016-12-01

    Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov ( https://clinicaltrials.gov/ ), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov . Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs.

  16. The Adam and Eve Robot Scientists for the Automated Discovery of Scientific Knowledge

    NASA Astrophysics Data System (ADS)

    King, Ross

    A Robot Scientist is a physically implemented robotic system that applies techniques from artificial intelligence to execute cycles of automated scientific experimentation. A Robot Scientist can automatically execute cycles of hypothesis formation, selection of efficient experiments to discriminate between hypotheses, execution of experiments using laboratory automation equipment, and analysis of results. The motivation for developing Robot Scientists is to better understand science, and to make scientific research more efficient. The Robot Scientist `Adam' was the first machine to autonomously discover scientific knowledge: both form and experimentally confirm novel hypotheses. Adam worked in the domain of yeast functional genomics. The Robot Scientist `Eve' was originally developed to automate early-stage drug development, with specific application to neglected tropical disease such as malaria, African sleeping sickness, etc. We are now adapting Eve to work with on cancer. We are also teaching Eve to autonomously extract information from the scientific literature.

  17. Top Down Tandem Mass Spectrometric Analysis of a Chemically Modified Rough-Type Lipopolysaccharide Vaccine Candidate.

    PubMed

    Oyler, Benjamin L; Khan, Mohd M; Smith, Donald F; Harberts, Erin M; Kilgour, David P A; Ernst, Robert K; Cross, Alan S; Goodlett, David R

    2018-06-01

    Recent advances in lipopolysaccharide (LPS) biology have led to its use in drug discovery pipelines, including vaccine and vaccine adjuvant discovery. Desirable characteristics for LPS vaccine candidates include both the ability to produce a specific antibody titer in patients and a minimal host inflammatory response directed by the innate immune system. However, in-depth chemical characterization of most LPS extracts has not been performed; hence, biological activities of these extracts are unpredictable. Additionally, the most widely adopted workflow for LPS structure elucidation includes nonspecific chemical decomposition steps before analyses, making structures inferred and not necessarily biologically relevant. In this work, several different mass spectrometry workflows that have not been previously explored were employed to show proof-of-principle for top down LPS primary structure elucidation, specifically for a rough-type mutant (J5) E. coli-derived LPS component of a vaccine candidate. First, ion mobility filtered precursor ions were subjected to collision induced dissociation (CID) to define differences in native J5 LPS v. chemically detoxified J5 LPS (dLPS). Next, ultra-high mass resolving power, accurate mass spectrometry was employed for unequivocal precursor and product ion empirical formulae generation. Finally, MS 3 analyses in an ion trap instrument showed that previous knowledge about dissociation of LPS components can be used to reconstruct and sequence LPS in a top down fashion. A structural rationale is also explained for differential inflammatory dose-response curves, in vitro, when HEK-Blue hTLR4 cells were administered increasing concentrations of native J5 LPS v. dLPS, which will be useful in future drug discovery efforts. Graphical Abstract ᅟ.

  18. Top Down Tandem Mass Spectrometric Analysis of a Chemically Modified Rough-Type Lipopolysaccharide Vaccine Candidate

    NASA Astrophysics Data System (ADS)

    Oyler, Benjamin L.; Khan, Mohd M.; Smith, Donald F.; Harberts, Erin M.; Kilgour, David P. A.; Ernst, Robert K.; Cross, Alan S.; Goodlett, David R.

    2018-02-01

    Recent advances in lipopolysaccharide (LPS) biology have led to its use in drug discovery pipelines, including vaccine and vaccine adjuvant discovery. Desirable characteristics for LPS vaccine candidates include both the ability to produce a specific antibody titer in patients and a minimal host inflammatory response directed by the innate immune system. However, in-depth chemical characterization of most LPS extracts has not been performed; hence, biological activities of these extracts are unpredictable. Additionally, the most widely adopted workflow for LPS structure elucidation includes nonspecific chemical decomposition steps before analyses, making structures inferred and not necessarily biologically relevant. In this work, several different mass spectrometry workflows that have not been previously explored were employed to show proof-of-principle for top down LPS primary structure elucidation, specifically for a rough-type mutant (J5) E. coli-derived LPS component of a vaccine candidate. First, ion mobility filtered precursor ions were subjected to collision induced dissociation (CID) to define differences in native J5 LPS v. chemically detoxified J5 LPS (dLPS). Next, ultra-high mass resolving power, accurate mass spectrometry was employed for unequivocal precursor and product ion empirical formulae generation. Finally, MS3 analyses in an ion trap instrument showed that previous knowledge about dissociation of LPS components can be used to reconstruct and sequence LPS in a top down fashion. A structural rationale is also explained for differential inflammatory dose-response curves, in vitro, when HEK-Blue hTLR4 cells were administered increasing concentrations of native J5 LPS v. dLPS, which will be useful in future drug discovery efforts. [Figure not available: see fulltext.

  19. A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification

    DOE PAGES

    Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; ...

    2013-01-01

    Background . The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective . To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods . The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expertmore » knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results . The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions . Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less

  20. A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification

    PubMed Central

    Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Varnum, Susan M.; Brown, Joseph N.; Riensche, Roderick M.; Adkins, Joshua N.; Jacobs, Jon M.; Hoidal, John R.; Scholand, Mary Beth; Pounds, Joel G.; Blackburn, Michael R.; Rodland, Karin D.; McDermott, Jason E.

    2013-01-01

    Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification. PMID:24223463

  1. A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.

    2013-10-01

    Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integratedmore » into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less

  2. Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks

    PubMed Central

    2012-01-01

    Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614

  3. Resource Discovery within the Networked "Hybrid" Library.

    ERIC Educational Resources Information Center

    Leigh, Sally-Anne

    This paper focuses on the development, adoption, and integration of resource discovery, knowledge management, and/or knowledge sharing interfaces such as interactive portals, and the use of the library's World Wide Web presence to increase the availability and usability of information services. The introduction addresses changes in library…

  4. A biological compression model and its applications.

    PubMed

    Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd

    2011-01-01

    A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.

  5. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media

    PubMed Central

    Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel

    2013-01-01

    Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295

  6. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    ERIC Educational Resources Information Center

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  7. Knowledge discovery and system biology in molecular medicine: an application on neurodegenerative diseases.

    PubMed

    Fattore, Matteo; Arrigo, Patrizio

    2005-01-01

    The possibility to study an organism in terms of system theory has been proposed in the past, but only the advancement of molecular biology techniques allow us to investigate the dynamical properties of a biological system in a more quantitative and rational way than before . These new techniques can gave only the basic level view of an organisms functionality. The comprehension of its dynamical behaviour depends on the possibility to perform a multiple level analysis. Functional genomics has stimulated the interest in the investigation the dynamical behaviour of an organism as a whole. These activities are commonly known as System Biology, and its interests ranges from molecules to organs. One of the more promising applications is the 'disease modeling'. The use of experimental models is a common procedure in pharmacological and clinical researches; today this approach is supported by 'in silico' predictive methods. This investigation can be improved by a combination of experimental and computational tools. The Machine Learning (ML) tools are able to process different heterogeneous data sources, taking into account this peculiarity, they could be fruitfully applied to support a multilevel data processing (molecular, cellular and morphological) that is the prerequisite for the formal model design; these techniques can allow us to extract the knowledge for mathematical model development. The aim of our work is the development and implementation of a system that combines ML and dynamical models simulations. The program is addressed to the virtual analysis of the pathways involved in neurodegenerative diseases. These pathologies are multifactorial diseases and the relevance of the different factors has not yet been well elucidated. This is a very complex task; in order to test the integrative approach our program has been limited to the analysis of the effects of a specific protein, the Cyclin dependent kinase 5 (CDK5) which relies on the induction of neuronal apoptosis. The system has a modular structure centred on a textual knowledge discovery approach. The text mining is the only way to enhance the capability to extract ,from multiple data sources, the information required for the dynamical simulator. The user may access the publically available modules through the following site: http://biocomp.ge.ismac.cnr.it.

  8. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  9. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    PubMed

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  10. Form-Focused Discovery Activities in English Classes

    ERIC Educational Resources Information Center

    Ogeyik, Muhlise Cosgun

    2011-01-01

    Form-focused discovery activities allow language learners to grasp various aspects of a target language by contributing implicit knowledge by using discovered explicit knowledge. Moreover, such activities can assist learners to perceive and discover the features of their language input. In foreign language teaching environments, they can be used…

  11. 75 FR 66766 - NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and Development

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-29

    ..., identifies gaps in knowledge and capabilities, and defines NIAID's goals for the continued discovery... DEPARTMENT OF HEALTH AND HUMAN SERVICES NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and... agenda for the discovery, development and clinical evaluation of adjuvants for use with preventive...

  12. 12 CFR 263.53 - Discovery depositions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 3 2011-01-01 2011-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...

  13. 12 CFR 19.170 - Discovery depositions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 1 2010-01-01 2010-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...

  14. 12 CFR 19.170 - Discovery depositions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 1 2011-01-01 2011-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...

  15. 12 CFR 263.53 - Discovery depositions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 3 2010-01-01 2010-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...

  16. BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation

    PubMed Central

    2011-01-01

    We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594

  17. Knowledge Discovery for Transonic Regional-Jet Wing through Multidisciplinary Design Exploration

    NASA Astrophysics Data System (ADS)

    Chiba, Kazuhisa; Obayashi, Shigeru; Morino, Hiroyuki

    Data mining is an important facet of solving multi-objective optimization problem. Because it is one of the effective manner to discover the design knowledge in the multi-objective optimization problem which obtains large data. In the present study, data mining has been performed for a large-scale and real-world multidisciplinary design optimization (MDO) to provide knowledge regarding the design space. The MDO among aerodynamics, structures, and aeroelasticity of the regional-jet wing was carried out using high-fidelity evaluation models on the adaptive range multi-objective genetic algorithm. As a result, nine non-dominated solutions were generated and used for tradeoff analysis among three objectives. All solutions evaluated during the evolution were analyzed for the tradeoffs and influence of design variables using a self-organizing map to extract key features of the design space. Although the MDO results showed the inverted gull-wings as non-dominated solutions, one of the key features found by data mining was the non-gull wing geometry. When this knowledge was applied to one optimum solution, the resulting design was found to have better performance compared with the original geometry designed in the conventional manner.

  18. Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers.

    PubMed

    Labaj, Wojciech; Papiez, Anna; Polanski, Andrzej; Polanska, Joanna

    2017-03-01

    Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments.

  19. Taking stock of current societal, political and academic stakeholders in the Canadian healthcare knowledge translation agenda

    PubMed Central

    Newton, Mandi S; Scott-Findlay, Shannon

    2007-01-01

    Background In the past 15 years, knowledge translation in healthcare has emerged as a multifaceted and complex agenda. Theoretical and polemical discussions, the development of a science to study and measure the effects of translating research evidence into healthcare, and the role of key stakeholders including academe, healthcare decision-makers, the public, and government funding bodies have brought scholarly, organizational, social, and political dimensions to the agenda. Objective This paper discusses the current knowledge translation agenda in Canadian healthcare and how elements in this agenda shape the discovery and translation of health knowledge. Discussion The current knowledge translation agenda in Canadian healthcare involves the influence of values, priorities, and people; stakes which greatly shape the discovery of research knowledge and how it is or is not instituted in healthcare delivery. As this agenda continues to take shape and direction, ensuring that it is accountable for its influences is essential and should be at the forefront of concern to the Canadian public and healthcare community. This transparency will allow for scrutiny, debate, and improvements in health knowledge discovery and health services delivery. PMID:17916256

  20. The center for causal discovery of biomedical knowledge from big data.

    PubMed

    Cooper, Gregory F; Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard

    2015-11-01

    The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. How does non-formal marine education affect student attitude and knowledge? A case study using SCDNR's Discovery program

    NASA Astrophysics Data System (ADS)

    McGovern, Mary Francis

    Non-formal environmental education provides students the opportunity to learn in ways that would not be possible in a traditional classroom setting. Outdoor learning allows students to make connections to their environment and helps to foster an appreciation for nature. This type of education can be interdisciplinary---students not only develop skills in science, but also in mathematics, social studies, technology, and critical thinking. This case study focuses on a non-formal marine education program, the South Carolina Department of Natural Resources' (SCDNR) Discovery vessel based program. The Discovery curriculum was evaluated to determine impact on student knowledge about and attitude toward the estuary. Students from two South Carolina coastal counties who attended the boat program during fall 2014 were asked to complete a brief survey before, immediately after, and two weeks following the program. The results of this study indicate that both student knowledge about and attitude significantly improved after completion of the Discovery vessel based program. Knowledge and attitude scores demonstrated a positive correlation.

  2. Inhibitory capacity of Rhus coriaria L. extract and its major component methyl gallate on Streptococcus mutans biofilm formation by optical profilometry: Potential applications for oral health.

    PubMed

    Kacergius, Tomas; Abu-Lafi, Saleh; Kirkliauskiene, Agne; Gabe, Vika; Adawi, Azmi; Rayan, Mahmoud; Qutob, Mutaz; Stukas, Rimantas; Utkus, Algirdas; Zeidan, Mouhammad; Rayan, Anwar

    2017-07-01

    Streptococcus mutans (S. mutans) bacterium is the most well recognized pathogen involved in pathogenesis of dental caries. Its virulence arises from its ability to produce a biofilm and acidogenicity, causing tooth decay. Discovery of natural products capable to inhibit biofilm formation is of high importance for developing health care products. To the best of our knowledge, in all previous scientific reports, a colorimetric assay was applied to test the effect of sumac and methyl gallate (MG) on S. mutans adherence. Quantitative assessment of the developed biofilm should be further performed by applying an optical profilometry assay, and by testing the effect on both surface roughness and thickness parameters of the biofilm. To the best of our knowledge, this is the first study to report the effect of sumac extract and its constituent MG on biofilm formation using an optical profilometry assay. Testing antibacterial activity of the sumac extract and its fractions revealed that MG is the most bioactive component against S. mutans bacteria. It reduced S. mutans biofilm biomass on the polystyrene surface by 68‑93%, whereas 1 mg/ml MG was able to decrease the biofilm roughness and thickness on the glass surface by 99%. MG also prevented a decrease in pH level by 97%. These bioactivities of MG occurred in a dose‑dependent manner and were significant vs. untreated bacteria. The findings are important for the development of novel pharmaceuticals and formulations of natural products and extracts that possess anti‑biofilm activities with primary applications for oral health, and in a broader context, for the treatment of various bacterial infections.

  3. Comparative mass spectrometry-based metabolomics strategies for the investigation of microbial secondary metabolites.

    PubMed

    Covington, Brett C; McLean, John A; Bachmann, Brian O

    2017-01-04

    Covering: 2000 to 2016The labor-intensive process of microbial natural product discovery is contingent upon identifying discrete secondary metabolites of interest within complex biological extracts, which contain inventories of all extractable small molecules produced by an organism or consortium. Historically, compound isolation prioritization has been driven by observed biological activity and/or relative metabolite abundance and followed by dereplication via accurate mass analysis. Decades of discovery using variants of these methods has generated the natural pharmacopeia but also contributes to recent high rediscovery rates. However, genomic sequencing reveals substantial untapped potential in previously mined organisms, and can provide useful prescience of potentially new secondary metabolites that ultimately enables isolation. Recently, advances in comparative metabolomics analyses have been coupled to secondary metabolic predictions to accelerate bioactivity and abundance-independent discovery work flows. In this review we will discuss the various analytical and computational techniques that enable MS-based metabolomic applications to natural product discovery and discuss the future prospects for comparative metabolomics in natural product discovery.

  4. Enhancing Learning Environments through Solution-based Knowledge Discovery Tools: Forecasting for Self-Perpetuating Systemic Reform.

    ERIC Educational Resources Information Center

    Tsantis, Linda; Castellani, John

    2001-01-01

    This article explores how knowledge-discovery applications can empower educators with the information they need to provide anticipatory guidance for teaching and learning, forecast school and district needs, and find critical markers for making the best program decisions for children and youth with disabilities. Data mining for schools is…

  5. Students and Teacher Academic Evaluation Perceptions: Methodology to Construct a Representation Based on Actionable Knowledge Discovery Framework

    ERIC Educational Resources Information Center

    Molina, Otilia Alejandro; Ratté, Sylvie

    2017-01-01

    This research introduces a method to construct a unified representation of teachers and students perspectives based on the actionable knowledge discovery (AKD) and delivery framework. The representation is constructed using two models: one obtained from student evaluations and the other obtained from teachers' reflections about their teaching…

  6. Application of Knowledge Discovery in Databases Methodologies for Predictive Models for Pregnancy Adverse Events

    ERIC Educational Resources Information Center

    Taft, Laritza M.

    2010-01-01

    In its report "To Err is Human", The Institute of Medicine recommended the implementation of internal and external voluntary and mandatory automatic reporting systems to increase detection of adverse events. Knowledge Discovery in Databases (KDD) allows the detection of patterns and trends that would be hidden or less detectable if analyzed by…

  7. Knowledge Discovery Process: Case Study of RNAV Adherence of Radar Track Data

    NASA Technical Reports Server (NTRS)

    Matthews, Bryan

    2018-01-01

    This talk is an introduction to the knowledge discovery process, beginning with: identifying the problem, choosing data sources, matching the appropriate machine learning tools, and reviewing the results. The overview will be given in the context of an ongoing study that is assessing RNAV adherence of commercial aircraft in the national airspace.

  8. A Virtual Bioinformatics Knowledge Environment for Early Cancer Detection

    NASA Technical Reports Server (NTRS)

    Crichton, Daniel; Srivastava, Sudhir; Johnsey, Donald

    2003-01-01

    Discovery of disease biomarkers for cancer is a leading focus of early detection. The National Cancer Institute created a network of collaborating institutions focused on the discovery and validation of cancer biomarkers called the Early Detection Research Network (EDRN). Informatics plays a key role in enabling a virtual knowledge environment that provides scientists real time access to distributed data sets located at research institutions across the nation. The distributed and heterogeneous nature of the collaboration makes data sharing across institutions very difficult. EDRN has developed a comprehensive informatics effort focused on developing a national infrastructure enabling seamless access, sharing and discovery of science data resources across all EDRN sites. This paper will discuss the EDRN knowledge system architecture, its objectives and its accomplishments.

  9. Empowering Accelerated Personal, Professional and Scholarly Discovery among Information Seekers: An Educational Vision

    ERIC Educational Resources Information Center

    Harmon, Glynn

    2013-01-01

    The term discovery applies herein to the successful outcome of inquiry in which a significant personal, professional or scholarly breakthrough or insight occurs, and which is individually or socially acknowledged as a key contribution to knowledge. Since discoveries culminate at fixed points in time, discoveries can serve as an outcome metric for…

  10. A Framework of Knowledge Integration and Discovery for Supporting Pharmacogenomics Target Predication of Adverse Drug Events: A Case Study of Drug-Induced Long QT Syndrome.

    PubMed

    Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G

    2013-01-01

    Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.

  11. Extraction of consensus protein patterns in regions containing non-proline cis peptide bonds and their functional assessment.

    PubMed

    Exarchos, Konstantinos P; Exarchos, Themis P; Rigas, Georgios; Papaloukas, Costas; Fotiadis, Dimitrios I

    2011-05-10

    In peptides and proteins, only a small percentile of peptide bonds adopts the cis configuration. Especially in the case of amide peptide bonds, the amount of cis conformations is quite limited thus hampering systematic studies, until recently. However, lately the emerging population of databases with more 3D structures of proteins has produced a considerable number of sequences containing non-proline cis formations (cis-nonPro). In our work, we extract regular expression-type patterns that are descriptive of regions surrounding the cis-nonPro formations. For this purpose, three types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, and iii) pattern discovery using a structural equivalency set. Afterwards, using each pattern as predicate, we search the Eukaryotic Linear Motif (ELM) resource to identify potential functional implications of regions with cis-nonPro peptide bonds. The patterns extracted from each type of pattern discovery are further employed, in order to formulate a pattern-based classifier, which is used to discriminate between cis-nonPro and trans-nonPro formations. In terms of functional implications, we observe a significant association of cis-nonPro peptide bonds towards ligand/binding functionalities. As for the pattern-based classification scheme, the highest results were obtained using the structural equivalency set, which yielded 70% accuracy, 77% sensitivity and 63% specificity.

  12. Mitochondrial lineage A2ah found in a pre-Hispanic individual from the Andean region.

    PubMed

    Russo, M G; Dejean, C B; Avena, S A; Seldes, V; Ramundo, P

    2018-05-10

    The aim of this study was to contribute to the knowledge of pre-Hispanic Andean mitochondrial diversity by analyzing an individual from the archaeological site Pukara de La Cueva (North-western Argentina). The date of the discovery context (540 ± 60 BP) corresponds to the Regional Developments II period. Two separate DNA extractions were performed from dentin powder of one tooth. HVR I was amplified by PCR from each extract in three overlapping fragments and the haplotype was determined by consensus among all obtained sequences. The procedures were carried out under strict protocols developed for working with ancient DNA. The individual belonged to the A2ah lineage due to the presence of the 16097C and 16098G transitions, which constitute its distinctive motif. This lineage is very rare in Native American populations and was described in four individuals from current groups inhabiting the Bolivian Llanos, two from South-eastern Brazil, and one from the Gran Chaco region. In addition, two other mutations (16260T and 16286T) were shared with one of the individuals from the Bolivian Llanos region. Considering that the origin of this lineage was postulated for the South American lowlands, the present pre-Hispanic discovery in the Andean area could be taken as a new evidence of gene flow between these regions. Also, it allows the questioning of the geographical origin of this mitochondrial lineage. © 2018 Wiley Periodicals, Inc.

  13. Leishmanicidal and cytotoxic activity from plants used in Tacana traditional medicine (Bolivia).

    PubMed

    Arévalo-Lopéz, Diandra; Nina, Nélida; Ticona, Juan C; Limachi, Ivan; Salamanca, Efrain; Udaeta, Enrique; Paredes, Crispin; Espinoza, Boris; Serato, Alcides; Garnica, David; Limachi, Abigail; Coaquira, Dayana; Salazar, Sarah; Flores, Ninoska; Sterner, Olov; Giménez, Alberto

    2018-04-24

    Thirty-eight Tacana medicinal plant species used to treat skin problems, including leishmania ulcers, skin infections, inflammation and wound healing, were collected in the community of Buena Vista, Bolivia, with the Tacana people. Twenty two species are documented for the first time as medicinal plants for this ethnic group living in the northern area of the Department of La Paz. To evaluate the leishmanicidal effect (IC 50 ) and cytotoxicity (LD 50 ) of the selected plants. To carry out bioguided studies on the active extracts. To assess the potential of Bolivian plant biodiversity associated with traditional knowledge in the discovery of alternative sources to fight leishmaniasis. Seventy three ethanol extracts were prepared from 38 species by maceration and were evaluated in vitro against promastigotes of Leishmania amazonensis and L. braziliensis. Active extracts (IC 50 ≤ 50 μg/mL) were fractionated by chromatography on Silica gel column and the fractions were assessed against the two Leishmania strains. The most active fractions and the crude extracts were evaluated against reference strains of L. amazonensis, L. braziliensis, L. aethiopica, two native strains (L. Lainsoni and L. braziliensis) and for cytotoxicity against HeLa cells. The chromatographic profile of the active fractions was obtained by reverse phase chromatography using HPLC. From the 73 extracts, 39 extracts (53.4%) were inactive and 34 showed activity. Thirteen species were sselected for bioguided studies. The crude extracts and their 36 fractions were evaluated against two Leishmania strains. The most active fraction were tested in a panel of five leishmania strains and for cytotoxicity. The Selective Index (SI = LD 50 /IC 50 ) was calculated, and were generally low. Retention time and UV spectra were recorded for the active fractions by HPLC-DAD using a reverse phase column. Profiles were very different from each other, showing the presence of different compounds. Bolivian traditional knowledge from the Tacanba was useful to identify plants with effect on Leishmania promastigotes. Chromatographic bioguided studies showed stronger leishmanicidal and cytotoxic activity for the medium polar fraction. HPLC analysis showed different chromatographic profiles of the active fractions. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis.

    PubMed

    Vafaee, Fatemeh; Diakos, Connie; Kirschner, Michaela B; Reid, Glen; Michael, Michael Z; Horvath, Lisa G; Alinejad-Rokny, Hamid; Cheng, Zhangkai Jason; Kuncic, Zdenka; Clarke, Stephen

    2018-01-01

    Recent advances in high-throughput technologies have provided an unprecedented opportunity to identify molecular markers of disease processes. This plethora of complex-omics data has simultaneously complicated the problem of extracting meaningful molecular signatures and opened up new opportunities for more sophisticated integrative and holistic approaches. In this era, effective integration of data-driven and knowledge-based approaches for biomarker identification has been recognised as key to improving the identification of high-performance biomarkers, and necessary for translational applications. Here, we have evaluated the role of circulating microRNA as a means of predicting the prognosis of patients with colorectal cancer, which is the second leading cause of cancer-related death worldwide. We have developed a multi-objective optimisation method that effectively integrates a data-driven approach with the knowledge obtained from the microRNA-mediated regulatory network to identify robust plasma microRNA signatures which are reliable in terms of predictive power as well as functional relevance. The proposed multi-objective framework has the capacity to adjust for conflicting biomarker objectives and to incorporate heterogeneous information facilitating systems approaches to biomarker discovery. We have found a prognostic signature of colorectal cancer comprising 11 circulating microRNAs. The identified signature predicts the patients' survival outcome and targets pathways underlying colorectal cancer progression. The altered expression of the identified microRNAs was confirmed in an independent public data set of plasma samples of patients in early stage vs advanced colorectal cancer. Furthermore, the generality of the proposed method was demonstrated across three publicly available miRNA data sets associated with biomarker studies in other diseases.

  15. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts

    PubMed Central

    Zhang, Shaodian; Elhadad, Nóemie

    2013-01-01

    Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592

  16. RHSEG and Subdue: Background and Preliminary Approach for Combining these Technologies for Enhanced Image Data Analysis, Mining and Knowledge Discovery

    NASA Technical Reports Server (NTRS)

    Tilton, James C.; Cook, Diane J.

    2008-01-01

    Under a project recently selected for funding by NASA's Science Mission Directorate under the Applied Information Systems Research (AISR) program, Tilton and Cook will design and implement the integration of the Subdue graph based knowledge discovery system, developed at the University of Texas Arlington and Washington State University, with image segmentation hierarchies produced by the RHSEG software, developed at NASA GSFC, and perform pilot demonstration studies of data analysis, mining and knowledge discovery on NASA data. Subdue represents a method for discovering substructures in structural databases. Subdue is devised for general-purpose automated discovery, concept learning, and hierarchical clustering, with or without domain knowledge. Subdue was developed by Cook and her colleague, Lawrence B. Holder. For Subdue to be effective in finding patterns in imagery data, the data must be abstracted up from the pixel domain. An appropriate abstraction of imagery data is a segmentation hierarchy: a set of several segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. The RHSEG program, a recursive approximation to a Hierarchical Segmentation approach (HSEG), can produce segmentation hierarchies quickly and effectively for a wide variety of images. RHSEG and HSEG were developed at NASA GSFC by Tilton. In this presentation we provide background on the RHSEG and Subdue technologies and present a preliminary analysis on how RHSEG and Subdue may be combined to enhance image data analysis, mining and knowledge discovery.

  17. Predicting future discoveries from current scientific literature.

    PubMed

    Petrič, Ingrid; Cestnik, Bojan

    2014-01-01

    Knowledge discovery in biomedicine is a time-consuming process starting from the basic research, through preclinical testing, towards possible clinical applications. Crossing of conceptual boundaries is often needed for groundbreaking biomedical research that generates highly inventive discoveries. We demonstrate the ability of a creative literature mining method to advance valuable new discoveries based on rare ideas from existing literature. When emerging ideas from scientific literature are put together as fragments of knowledge in a systematic way, they may lead to original, sometimes surprising, research findings. If enough scientific evidence is already published for the association of such findings, they can be considered as scientific hypotheses. In this chapter, we describe a method for the computer-aided generation of such hypotheses based on the existing scientific literature. Our literature-based discovery of NF-kappaB with its possible connections to autism was recently approved by scientific community, which confirms the ability of our literature mining methodology to accelerate future discoveries based on rare ideas from existing literature.

  18. Progress and Prospects for Stem Cell Engineering

    PubMed Central

    Ashton, Randolph S.; Keung, Albert J.; Peltier, Joseph; Schaffer, David V.

    2018-01-01

    Stem cells offer tremendous biomedical potential owing to their abilities to self-renew and differentiate into cell types of multiple adult tissues. Researchers and engineers have increasingly developed novel discovery technologies, theoretical approaches, and cell culture systems to investigate microenvironmental cues and cellular signaling events that control stem cell fate. Many of these technologies facilitate high-throughput investigation of microenvironmental signals and the intracellular signaling networks and machinery processing those signals into cell fate decisions. As our aggregate empirical knowledge of stem cell regulation grows, theoretical modeling with systems and computational biology methods has and will continue to be important for developing our ability to analyze and extract important conceptual features of stem cell regulation from complex data. Based on this body of knowledge, stem cell engineers will continue to develop technologies that predictably control stem cell fate with the ultimate goal of being able to accurately and economically scale up these systems for clinical-grade production of stem cell therapeutics. PMID:22432628

  19. Bioenergy Knowledge Discovery Framework Fact Sheet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    The Bioenergy Knowledge Discovery Framework (KDF) supports the development of a sustainable bioenergy industry by providing access to a variety of data sets, publications, and collaboration and mapping tools that support bioenergy research, analysis, and decision making. In the KDF, users can search for information, contribute data, and use the tools and map interface to synthesize, analyze, and visualize information in a spatially integrated manner.

  20. Teachers' Journal Club: Bridging between the Dynamics of Biological Discoveries and Biology Teachers

    ERIC Educational Resources Information Center

    Brill, Gilat; Falk, Hedda; Yarden, Anat

    2003-01-01

    Since biology is one of the most dynamic research fields within the natural sciences, the gap between the accumulated knowledge in biology and the knowledge that is taught in schools, increases rapidly with time. Our long-term objective is to develop means to bridge between the dynamics of biological discoveries and the biology teachers and…

  1. Discovery of Newer Therapeutic Leads for Prostate Cancer

    DTIC Science & Technology

    2009-06-01

    promising plant extracts and then prepare large-scale quantities of the plant extracts using supercritical fluid extraction techniques and use this...quantities of the plant extracts using supercritical fluid extraction techniques. Large scale plant collections were conducted for 14 of the top 20...material for bioassay-guided fractionation of the biologically active constituents using modern chromatography techniques. The chemical structures of

  2. On the Preservation of Unitarity during Black Hole Evolution and Information Extraction from its Interior

    NASA Astrophysics Data System (ADS)

    Pappas, Nikolaos D.

    2012-06-01

    For more than 30 years the discovery that black holes radiate like black bodies of specific temperature has triggered a multitude of puzzling questions concerning their nature and the fate of information that goes down the black hole during its lifetime. The most tricky issue in what is known as information loss paradox is the apparent violation of unitarity during the formation/evaporation process of black holes. A new idea is proposed based on the combination of our knowledge on Hawking radiation as well as the Einstein-Podolsky-Rosen phenomenon, that could resolve the paradox and spare physicists from the unpalatable idea that unitarity can ultimately be irreversibly violated even under special conditions.

  3. Hyperspectral Feature Detection Onboard the Earth Observing One Spacecraft using Superpixel Segmentation and Endmember Extraction

    NASA Technical Reports Server (NTRS)

    Thompson, David R.; Bornstein, Benjamin; Bue, Brian D.; Tran, Daniel Q.; Chien, Steve A.; Castano, Rebecca

    2012-01-01

    We present a demonstration of onboard hyperspectral image processing with the potential to reduce mission downlink requirements. The system detects spectral endmembers and then uses them to map units of surface material. This summarizes the content of the scene, reveals spectral anomalies warranting fast response, and reduces data volume by two orders of magnitude. We have integrated this system into the Autonomous Science craft Experiment for operational use onboard the Earth Observing One (EO-1) Spacecraft. The system does not require prior knowledge about spectra of interest. We report on a series of trial overflights in which identical spacecraft commands are effective for autonomous spectral discovery and mapping for varied target features, scenes and imaging conditions.

  4. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    PubMed

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  5. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework

    PubMed Central

    Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145

  6. Computational functional genomics-based approaches in analgesic drug discovery and repurposing.

    PubMed

    Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn

    2018-06-01

    Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.

  7. PREDOSE: a semantic web platform for drug abuse epidemiology using social media.

    PubMed

    Cameron, Delroy; Smith, Gary A; Daniulaityte, Raminta; Sheth, Amit P; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z; Falck, Russel

    2013-12-01

    The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO--pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Towards a Semantic Web of Things: A Hybrid Semantic Annotation, Extraction, and Reasoning Framework for Cyber-Physical System.

    PubMed

    Wu, Zhenyu; Xu, Yuan; Yang, Yunong; Zhang, Chunhong; Zhu, Xinning; Ji, Yang

    2017-02-20

    Web of Things (WoT) facilitates the discovery and interoperability of Internet of Things (IoT) devices in a cyber-physical system (CPS). Moreover, a uniform knowledge representation of physical resources is quite necessary for further composition, collaboration, and decision-making process in CPS. Though several efforts have integrated semantics with WoT, such as knowledge engineering methods based on semantic sensor networks (SSN), it still could not represent the complex relationships between devices when dynamic composition and collaboration occur, and it totally depends on manual construction of a knowledge base with low scalability. In this paper, to addresses these limitations, we propose the semantic Web of Things (SWoT) framework for CPS (SWoT4CPS). SWoT4CPS provides a hybrid solution with both ontological engineering methods by extending SSN and machine learning methods based on an entity linking (EL) model. To testify to the feasibility and performance, we demonstrate the framework by implementing a temperature anomaly diagnosis and automatic control use case in a building automation system. Evaluation results on the EL method show that linking domain knowledge to DBpedia has a relative high accuracy and the time complexity is at a tolerant level. Advantages and disadvantages of SWoT4CPS with future work are also discussed.

  9. Learning in the context of distribution drift

    DTIC Science & Technology

    2017-05-09

    published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD

  10. A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences

    DTIC Science & Technology

    2009-02-23

    AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings

  11. k-neighborhood Decentralization: A Comprehensive Solution to Index the UMLS for Large Scale Knowledge Discovery

    PubMed Central

    Xiang, Yang; Lu, Kewei; James, Stephen L.; Borlawsky, Tara B.; Huang, Kun; Payne, Philip R.O.

    2011-01-01

    The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. PMID:22154838

  12. k-Neighborhood decentralization: a comprehensive solution to index the UMLS for large scale knowledge discovery.

    PubMed

    Xiang, Yang; Lu, Kewei; James, Stephen L; Borlawsky, Tara B; Huang, Kun; Payne, Philip R O

    2012-04-01

    The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. A Graph-Based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications

    PubMed Central

    Cameron, Delroy; Bodenreider, Olivier; Yalamanchili, Hima; Danh, Tu; Vallabhaneni, Sreeram; Thirunarayan, Krishnaprasad; Sheth, Amit P.; Rindflesch, Thomas C.

    2014-01-01

    Objectives This paper presents a methodology for recovering and decomposing Swanson’s Raynaud Syndrome–Fish Oil Hypothesis semi-automatically. The methodology leverages the semantics of assertions extracted from biomedical literature (called semantic predications) along with structured background knowledge and graph-based algorithms to semi-automatically capture the informative associations originally discovered manually by Swanson. Demonstrating that Swanson’s manually intensive techniques can be undertaken semi-automatically, paves the way for fully automatic semantics-based hypothesis generation from scientific literature. Methods Semantic predications obtained from biomedical literature allow the construction of labeled directed graphs which contain various associations among concepts from the literature. By aggregating such associations into informative subgraphs, some of the relevant details originally articulated by Swanson has been uncovered. However, by leveraging background knowledge to bridge important knowledge gaps in the literature, a methodology for semi-automatically capturing the detailed associations originally explicated in natural language by Swanson has been developed. Results Our methodology not only recovered the 3 associations commonly recognized as Swanson’s Hypothesis, but also decomposed them into an additional 16 detailed associations, formulated as chains of semantic predications. Altogether, 14 out of the 19 associations that can be attributed to Swanson were retrieved using our approach. To the best of our knowledge, such an in-depth recovery and decomposition of Swanson’s Hypothesis has never been attempted. Conclusion In this work therefore, we presented a methodology for semi- automatically recovering and decomposing Swanson’s RS-DFO Hypothesis using semantic representations and graph algorithms. Our methodology provides new insights into potential prerequisites for semantics-driven Literature-Based Discovery (LBD). These suggest that three critical aspects of LBD include: 1) the need for more expressive representations beyond Swanson’s ABC model; 2) an ability to accurately extract semantic information from text; and 3) the semantic integration of scientific literature with structured background knowledge. PMID:23026233

  14. Knowledge Retrieval Solutions.

    ERIC Educational Resources Information Center

    Khan, Kamran

    1998-01-01

    Excalibur RetrievalWare offers true knowledge retrieval solutions. Its fundamental technologies, Adaptive Pattern Recognition Processing and Semantic Networks, have capabilities for knowledge discovery and knowledge management of full-text, structured and visual information. The software delivers a combination of accuracy, extensibility,…

  15. Flood AI: An Intelligent Systems for Discovery and Communication of Disaster Knowledge

    NASA Astrophysics Data System (ADS)

    Demir, I.; Sermet, M. Y.

    2017-12-01

    Communities are not immune from extreme events or natural disasters that can lead to large-scale consequences for the nation and public. Improving resilience to better prepare, plan, recover, and adapt to disasters is critical to reduce the impacts of extreme events. The National Research Council (NRC) report discusses the topic of how to increase resilience to extreme events through a vision of resilient nation in the year 2030. The report highlights the importance of data, information, gaps and knowledge challenges that needs to be addressed, and suggests every individual to access the risk and vulnerability information to make their communities more resilient. This project presents an intelligent system, Flood AI, for flooding to improve societal preparedness by providing a knowledge engine using voice recognition, artificial intelligence, and natural language processing based on a generalized ontology for disasters with a primary focus on flooding. The knowledge engine utilizes the flood ontology and concepts to connect user input to relevant knowledge discovery channels on flooding by developing a data acquisition and processing framework utilizing environmental observations, forecast models, and knowledge bases. Communication channels of the framework includes web-based systems, agent-based chat bots, smartphone applications, automated web workflows, and smart home devices, opening the knowledge discovery for flooding to many unique use cases.

  16. Common characteristics of open source software development and applicability for drug discovery: a systematic review.

    PubMed

    Ardal, Christine; Alstadsæter, Annette; Røttingen, John-Arne

    2011-09-28

    Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents.

  17. Developing integrated crop knowledge networks to advance candidate gene discovery.

    PubMed

    Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher

    2016-12-01

    The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

  18. On the Growth of Scientific Knowledge: Yeast Biology as a Case Study

    PubMed Central

    He, Xionglei; Zhang, Jianzhi

    2009-01-01

    The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration. PMID:19300476

  19. On the growth of scientific knowledge: yeast biology as a case study.

    PubMed

    He, Xionglei; Zhang, Jianzhi

    2009-03-01

    The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration.

  20. Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective

    PubMed Central

    Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred

    2017-01-01

    Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388

  1. Titration-based screening for evaluation of natural product extracts: identification of an aspulvinone family of luciferase inhibitors

    PubMed Central

    Cruz, Patricia G.; Auld, Douglas S.; Schultz, Pamela J.; Lovell, Scott; Battaile, Kevin P.; MacArthur, Ryan; Shen, Min; Tamayo-Castillo, Giselle; Inglese, James; Sherman, David H.

    2011-01-01

    The chemical diversity of nature has tremendous potential for discovery of new molecular probes and medicinal agents. However, sensitivity of HTS assays to interfering components of crude extracts derived from plants, macro- and microorganisms has curtailed their use in lead discovery efforts. Here we describe a process for leveraging the concentration-response curves (CRCs) obtained from quantitative HTS to improve the initial selection of “actives” from a library of partially fractionated natural product extracts derived from marine actinomycetes and fungi. By using pharmacological activity, the first-pass CRC paradigm aims to improve the probability that labor-intensive subsequent steps of re-culturing, extraction and bioassay-guided isolation of active component(s) target the most promising strains and growth conditions. We illustrate how this process identified a family of fungal metabolites as potent inhibitors of firefly luciferase, subsequently resolved in molecular detail by x-ray crystallography. PMID:22118678

  2. Titration-based screening for evaluation of natural product extracts: identification of an aspulvinone family of luciferase inhibitors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cruz, P.G.; Auld, D.S.; Schultz, P.J.

    2011-11-28

    The chemical diversity of nature has tremendous potential for the discovery of molecular probes and medicinal agents. However, sensitivity of HTS assays to interfering components of crude extracts derived from plants, and macro- and microorganisms has curtailed their use in lead discovery. Here, we describe a process for leveraging the concentration-response curves obtained from quantitative HTS to improve the initial selection of actives from a library of partially fractionated natural product extracts derived from marine actinomycetes and fungi. By using pharmacological activity, the first-pass CRC paradigm improves the probability that labor-intensive subsequent steps of reculturing, extraction, and bioassay-guided isolation ofmore » active component(s) target the most promising strains and growth conditions. We illustrate how this process identified a family of fungal metabolites as potent inhibitors of firefly luciferase, subsequently resolved in molecular detail by X-ray crystallography.« less

  3. Intelligent services for discovery of complex geospatial features from remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Yue, Peng; Di, Liping; Wei, Yaxing; Han, Weiguo

    2013-09-01

    Remote sensing imagery has been commonly used by intelligence analysts to discover geospatial features, including complex ones. The overwhelming volume of routine image acquisition requires automated methods or systems for feature discovery instead of manual image interpretation. The methods of extraction of elementary ground features such as buildings and roads from remote sensing imagery have been studied extensively. The discovery of complex geospatial features, however, is still rather understudied. A complex feature, such as a Weapon of Mass Destruction (WMD) proliferation facility, is spatially composed of elementary features (e.g., buildings for hosting fuel concentration machines, cooling towers, transportation roads, and fences). Such spatial semantics, together with thematic semantics of feature types, can be used to discover complex geospatial features. This paper proposes a workflow-based approach for discovery of complex geospatial features that uses geospatial semantics and services. The elementary features extracted from imagery are archived in distributed Web Feature Services (WFSs) and discoverable from a catalogue service. Using spatial semantics among elementary features and thematic semantics among feature types, workflow-based service chains can be constructed to locate semantically-related complex features in imagery. The workflows are reusable and can provide on-demand discovery of complex features in a distributed environment.

  4. University of Washington's eScience Institute Promotes New Training and Career Pathways in Data Science

    NASA Astrophysics Data System (ADS)

    Stone, S.; Parker, M. S.; Howe, B.; Lazowska, E.

    2015-12-01

    Rapid advances in technology are transforming nearly every field from "data-poor" to "data-rich." The ability to extract knowledge from this abundance of data is the cornerstone of 21st century discovery. At the University of Washington eScience Institute, our mission is to engage researchers across disciplines in developing and applying advanced computational methods and tools to real world problems in data-intensive discovery. Our research team consists of individuals with diverse backgrounds in domain sciences such as astronomy, oceanography and geology, with complementary expertise in advanced statistical and computational techniques such as data management, visualization, and machine learning. Two key elements are necessary to foster careers in data science: individuals with cross-disciplinary training in both method and domain sciences, and career paths emphasizing alternative metrics for advancement. We see persistent and deep-rooted challenges for the career paths of people whose skills, activities and work patterns don't fit neatly into the traditional roles and success metrics of academia. To address these challenges the eScience Institute has developed training programs and established new career opportunities for data-intensive research in academia. Our graduate students and post-docs have mentors in both a methodology and an application field. They also participate in coursework and tutorials to advance technical skill and foster community. Professional Data Scientist positions were created to support research independence while encouraging the development and adoption of domain-specific tools and techniques. The eScience Institute also supports the appointment of faculty who are innovators in developing and applying data science methodologies to advance their field of discovery. Our ultimate goal is to create a supportive environment for data science in academia and to establish global recognition for data-intensive discovery across all fields.

  5. Phytotoxic triterpene saponis from Bellis longifolia, an endemic plant of Crete

    USDA-ARS?s Scientific Manuscript database

    In continuation of our research on discovery of bioactive compounds from plants we have screened extracts of 65 plant species of the Cretan flora for their phytotoxic activity. All plants were extracted successively with CH2Cl2, MeOH and H2O. Phytotoxicity evaluation of the 249 generated extracts wa...

  6. PRO-Elicere: A Study for Create a New Process of Dependability Analysis of Space Computer Systems

    NASA Astrophysics Data System (ADS)

    da Silva, Glauco; Netto Lahoz, Carlos Henrique

    2013-09-01

    This paper presents the new approach to the computer system dependability analysis, called PRO-ELICERE, which introduces data mining concepts and intelligent mechanisms to decision support to analyze the potential hazards and failures of a critical computer system. Also, are presented some techniques and tools that support the traditional dependability analysis and briefly discusses the concept of knowledge discovery and intelligent databases for critical computer systems. After that, introduces the PRO-ELICERE process, an intelligent approach to automate the ELICERE, a process created to extract non-functional requirements for critical computer systems. The PRO-ELICERE can be used in the V&V activities in the projects of Institute of Aeronautics and Space, such as the Brazilian Satellite Launcher (VLS-1).

  7. Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine.

    PubMed

    Ping, Peipei; Hermjakob, Henning; Polson, Jennifer S; Benos, Panagiotis V; Wang, Wei

    2018-04-27

    In the digital age of cardiovascular medicine, the rate of biomedical discovery can be greatly accelerated by the guidance and resources required to unearth potential collections of knowledge. A unified computational platform leverages metadata to not only provide direction but also empower researchers to mine a wealth of biomedical information and forge novel mechanistic insights. This review takes the opportunity to present an overview of the cloud-based computational environment, including the functional roles of metadata, the architecture schema of indexing and search, and the practical scenarios of machine learning-supported molecular signature extraction. By introducing several established resources and state-of-the-art workflows, we share with our readers a broadly defined informatics framework to phenotype cardiovascular health and disease. © 2018 American Heart Association, Inc.

  8. A semantic web ontology for small molecules and their biological targets.

    PubMed

    Choi, Jooyoung; Davis, Melissa J; Newman, Andrew F; Ragan, Mark A

    2010-05-24

    A wide range of data on sequences, structures, pathways, and networks of genes and gene products is available for hypothesis testing and discovery in biological and biomedical research. However, data describing the physical, chemical, and biological properties of small molecules have not been well-integrated with these resources. Semantically rich representations of chemical data, combined with Semantic Web technologies, have the potential to enable the integration of small molecule and biomolecular data resources, expanding the scope and power of biomedical and pharmacological research. We employed the Semantic Web technologies Resource Description Framework (RDF) and Web Ontology Language (OWL) to generate a Small Molecule Ontology (SMO) that represents concepts and provides unique identifiers for biologically relevant properties of small molecules and their interactions with biomolecules, such as proteins. We instanced SMO using data from three public data sources, i.e., DrugBank, PubChem and UniProt, and converted to RDF triples. Evaluation of SMO by use of predetermined competency questions implemented as SPARQL queries demonstrated that data from chemical and biomolecular data sources were effectively represented and that useful knowledge can be extracted. These results illustrate the potential of Semantic Web technologies in chemical, biological, and pharmacological research and in drug discovery.

  9. Applying Knowledge Discovery in Databases in Public Health Data Set: Challenges and Concerns

    PubMed Central

    Volrathongchia, Kanittha

    2003-01-01

    In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545

  10. Big, Deep, and Smart Data in Scanning Probe Microscopy

    DOE PAGES

    Kalinin, Sergei V.; Strelcov, Evgheni; Belianinov, Alex; ...

    2016-09-27

    Scanning probe microscopy techniques open the door to nanoscience and nanotechnology by enabling imaging and manipulation of structure and functionality of matter on nanometer and atomic scales. We analyze the discovery process by SPM in terms of information flow from tip-surface junction to the knowledge adoption by scientific community. Furthermore, we discuss the challenges and opportunities offered by merging of SPM and advanced data mining, visual analytics, and knowledge discovery technologies.

  11. Exploiting Early Intent Recognition for Competitive Advantage

    DTIC Science & Technology

    2009-01-01

    basketball [Bhan- dari et al., 1997; Jug et al., 2003], and Robocup soccer sim- ulations [Riley and Veloso, 2000; 2002; Kuhlmann et al., 2006] and non...actions (e.g. before, after, around). Jug et al. [2003] used a similar framework for offline basketball game analysis. More recently, Hess et al...and K. Ramanujam. Advanced Scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery, 1(1):121–125, 1997. [Chang

  12. An Alternative Time for Telling: When Conceptual Instruction Prior to Exploration Improves Mathematical Knowledge

    ERIC Educational Resources Information Center

    Fyfe, Emily R.; DeCaro, Marci S.; Rittle-Johnson, Bethany

    2013-01-01

    An emerging consensus suggests that guided discovery, which combines discovery and instruction, is a more effective educational approach than either one in isolation. The goal of this study was to examine two specific forms of guided discovery, testing whether conceptual instruction should precede or follow exploratory problem solving. In both…

  13. Contributing, Exchanging and Linking for Learning: Supporting Web Co-Discovery in One-to-One Environments

    ERIC Educational Resources Information Center

    Liu, Chen-Chung; Don, Ping-Hsing; Chung, Chen-Wei; Lin, Shao-Jun; Chen, Gwo-Dong; Liu, Baw-Jhiune

    2010-01-01

    While Web discovery is usually undertaken as a solitary activity, Web co-discovery may transform Web learning activities from the isolated individual search process into interactive and collaborative knowledge exploration. Recent studies have proposed Web co-search environments on a single computer, supported by multiple one-to-one technologies.…

  14. Knowledge Management in Higher Education: A Knowledge Repository Approach

    ERIC Educational Resources Information Center

    Wedman, John; Wang, Feng-Kwei

    2005-01-01

    One might expect higher education, where the discovery and dissemination of new and useful knowledge is vital, to be among the first to implement knowledge management practices. Surprisingly, higher education has been slow to implement knowledge management practices (Townley, 2003). This article describes an ongoing research and development effort…

  15. A gossip based information fusion protocol for distributed frequent itemset mining

    NASA Astrophysics Data System (ADS)

    Sohrabi, Mohammad Karim

    2018-07-01

    The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.

  16. Crowdsourcing Knowledge Discovery and Innovations in Medicine

    PubMed Central

    2014-01-01

    Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. “Health hackathons” and “data marathons”, in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled. PMID:25239002

  17. Crowdsourcing knowledge discovery and innovations in medicine.

    PubMed

    Celi, Leo Anthony; Ippolito, Andrea; Montgomery, Robert A; Moses, Christopher; Stone, David J

    2014-09-19

    Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. "Health hackathons" and "data marathons", in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled.

  18. Empirical study using network of semantically related associations in bridging the knowledge gap.

    PubMed

    Abedi, Vida; Yeasin, Mohammed; Zand, Ramin

    2014-11-27

    The data overload has created a new set of challenges in finding meaningful and relevant information with minimal cognitive effort. However designing robust and scalable knowledge discovery systems remains a challenge. Recent innovations in the (biological) literature mining tools have opened new avenues to understand the confluence of various diseases, genes, risk factors as well as biological processes in bridging the gaps between the massive amounts of scientific data and harvesting useful knowledge. In this paper, we highlight some of the findings using a text analytics tool, called ARIANA--Adaptive Robust and Integrative Analysis for finding Novel Associations. Empirical study using ARIANA reveals knowledge discovery instances that illustrate the efficacy of such tool. For example, ARIANA can capture the connection between the drug hexamethonium and pulmonary inflammation and fibrosis that caused the tragic death of a healthy volunteer in a 2001 John Hopkins asthma study, even though the abstract of the study was not part of the semantic model. An integrated system, such as ARIANA, could assist the human expert in exploratory literature search by bringing forward hidden associations, promoting data reuse and knowledge discovery as well as stimulating interdisciplinary projects by connecting information across the disciplines.

  19. Intelligent resource discovery using ontology-based resource profiles

    NASA Technical Reports Server (NTRS)

    Hughes, J. Steven; Crichton, Dan; Kelly, Sean; Crichton, Jerry; Tran, Thuy

    2004-01-01

    Successful resource discovery across heterogeneous repositories is strongly dependent on the semantic and syntactic homogeneity of the associated resource descriptions. Ideally, resource descriptions are easily extracted from pre-existing standardized sources, expressed using standard syntactic and semantic structures, and managed and accessed within a distributed, flexible, and scaleable software framework.

  20. Common characteristics of open source software development and applicability for drug discovery: a systematic review

    PubMed Central

    2011-01-01

    Background Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. Methods A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Results Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. Conclusions We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents. PMID:21955914

  1. Text-based discovery in biomedicine: the architecture of the DAD-system.

    PubMed

    Weeber, M; Klein, H; Aronson, A R; Mork, J G; de Jong-van den Berg, L T; Vos, R

    2000-01-01

    Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-system, a concept-based Natural Language Processing system for PubMed citations that provides the biomedical researcher such a tool. We describe the general architecture and illustrate its operation by a simulation of a well-known text-based discovery: The favorable effects of fish oil on patients suffering from Raynaud's disease [1].

  2. Drug Development and Conservation of Biodiversity in West and Central Africa: Performance of Neurochemical and Radio Receptor Assays of Plant Extracts Drug Discovery for the Central Nervous System

    DTIC Science & Technology

    2004-09-01

    7) Hui, D.; Sao-Xing, C. J. Nat. Prod. 1998, 61, 142-144. (8) Aldrich Libray of 13C and 1H FT NMR spectra 1992, 2, 326A. (9) Kadota, S .; Hui, D...Biodiversity in West and Central Africa: Performance of Neurochemical and Radio Receptor Assays of Plant Extracts Drug Discovery for the Central... s ) and should not be construed as an official Department of the Army position, policy or decision unless so designated by other documentation

  3. Which are the greatest recent discoveries and the greatest future challenges in nutrition?

    PubMed

    Katan, M B; Boekschoten, M V; Connor, W E; Mensink, R P; Seidell, J; Vessby, B; Willett, W

    2009-01-01

    Nutrition science aims to create new knowledge, but scientists rarely sit back to reflect on what nutrition research has achieved in recent decades. We report the outcome of a 1-day symposium at which the audience was asked to vote on the greatest discoveries in nutrition since 1976 and on the greatest challenges for the coming 30 years. Most of the 128 participants were Dutch scientists working in nutrition or related biomedical and public health fields. Candidate discoveries and challenges were nominated by five invited speakers and by members of the audience. Ballot forms were then prepared on which participants selected one discovery and one challenge. A total of 15 discoveries and 14 challenges were nominated. The audience elected Folic acid prevents birth defects as the greatest discovery in nutrition science since 1976. Controlling obesity and insulin resistance through activity and diet was elected as the greatest challenge for the coming 30 years. This selection was probably biased by the interests and knowledge of the speakers and the audience. For the present review, we therefore added 12 discoveries from the period 1976 to 2006 that we judged worthy of consideration, but that had not been nominated at the meeting. The meeting did not represent an objective selection process, but it did demonstrate that the past 30 years have yielded major new discoveries in nutrition and health.

  4. Translating three states of knowledge--discovery, invention, and innovation

    PubMed Central

    2010-01-01

    Background Knowledge Translation (KT) has historically focused on the proper use of knowledge in healthcare delivery. A knowledge base has been created through empirical research and resides in scholarly literature. Some knowledge is amenable to direct application by stakeholders who are engaged during or after the research process, as shown by the Knowledge to Action (KTA) model. Other knowledge requires multiple transformations before achieving utility for end users. For example, conceptual knowledge generated through science or engineering may become embodied as a technology-based invention through development methods. The invention may then be integrated within an innovative device or service through production methods. To what extent is KT relevant to these transformations? How might the KTA model accommodate these additional development and production activities while preserving the KT concepts? Discussion Stakeholders adopt and use knowledge that has perceived utility, such as a solution to a problem. Achieving a technology-based solution involves three methods that generate knowledge in three states, analogous to the three classic states of matter. Research activity generates discoveries that are intangible and highly malleable like a gas; development activity transforms discoveries into inventions that are moderately tangible yet still malleable like a liquid; and production activity transforms inventions into innovations that are tangible and immutable like a solid. The paper demonstrates how the KTA model can accommodate all three types of activity and address all three states of knowledge. Linking the three activities in one model also illustrates the importance of engaging the relevant stakeholders prior to initiating any knowledge-related activities. Summary Science and engineering focused on technology-based devices or services change the state of knowledge through three successive activities. Achieving knowledge implementation requires methods that accommodate these three activities and knowledge states. Accomplishing beneficial societal impacts from technology-based knowledge involves the successful progression through all three activities, and the effective communication of each successive knowledge state to the relevant stakeholders. The KTA model appears suitable for structuring and linking these processes. PMID:20205873

  5. Distributed data mining on grids: services, tools, and applications.

    PubMed

    Cannataro, Mario; Congiusta, Antonio; Pugliese, Andrea; Talia, Domenico; Trunfio, Paolo

    2004-12-01

    Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.

  6. Knowledge Discovery/A Collaborative Approach, an Innovative Solution

    NASA Technical Reports Server (NTRS)

    Fitts, Mary A.

    2009-01-01

    Collaboration between Medical Informatics and Healthcare Systems (MIHCS) at NASA/Johnson Space Center (JSC) and the Texas Medical Center (TMC) Library was established to investigate technologies for facilitating knowledge discovery across multiple life sciences research disciplines in multiple repositories. After reviewing 14 potential Enterprise Search System (ESS) solutions, Collexis was determined to best meet the expressed needs. A three month pilot evaluation of Collexis produced positive reports from multiple scientists across 12 research disciplines. The joint venture and a pilot-phased approach achieved the desired results without the high cost of purchasing software, hardware or additional resources to conduct the task. Medical research is highly compartmentalized by discipline, e.g. cardiology, immunology, neurology. The medical research community at large, as well as at JSC, recognizes the need for cross-referencing relevant information to generate best evidence. Cross-discipline collaboration at JSC is specifically required to close knowledge gaps affecting space exploration. To facilitate knowledge discovery across these communities, MIHCS combined expertise with the TMC library and found Collexis to best fit the needs of our researchers including:

  7. A Sequence-Independent Strategy for Detection and Cloning of Circular DNA Virus Genomes by Using Multiply Primed Rolling-Circle Amplification

    PubMed Central

    Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc

    2004-01-01

    The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879

  8. Big, Deep, and Smart Data in Scanning Probe Microscopy.

    PubMed

    Kalinin, Sergei V; Strelcov, Evgheni; Belianinov, Alex; Somnath, Suhas; Vasudevan, Rama K; Lingerfelt, Eric J; Archibald, Richard K; Chen, Chaomei; Proksch, Roger; Laanait, Nouamane; Jesse, Stephen

    2016-09-27

    Scanning probe microscopy (SPM) techniques have opened the door to nanoscience and nanotechnology by enabling imaging and manipulation of the structure and functionality of matter at nanometer and atomic scales. Here, we analyze the scientific discovery process in SPM by following the information flow from the tip-surface junction, to knowledge adoption by the wider scientific community. We further discuss the challenges and opportunities offered by merging SPM with advanced data mining, visual analytics, and knowledge discovery technologies.

  9. Building Scalable Knowledge Graphs for Earth Science

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Maskey, Manil; Gatlin, Patrick; Zhang, Jia; Duan, Xiaoyi; Miller, J. J.; Bugbee, Kaylin; Christopher, Sundar; Freitag, Brian

    2017-01-01

    Knowledge Graphs link key entities in a specific domain with other entities via relationships. From these relationships, researchers can query knowledge graphs for probabilistic recommendations to infer new knowledge. Scientific papers are an untapped resource which knowledge graphs could leverage to accelerate research discovery. Goal: Develop an end-to-end (semi) automated methodology for constructing Knowledge Graphs for Earth Science.

  10. Genetic discoveries and nursing implications for complex disease prevention and management.

    PubMed

    Frazier, Lorraine; Meininger, Janet; Halsey Lea, Dale; Boerwinkle, Eric

    2004-01-01

    The purpose of this article is to examine the management of patients with complex diseases, in light of recent genetic discoveries, and to explore how these genetic discoveries will impact nursing practice and nursing research. The nursing science processes discussed are not comprehensive of all nursing practice but, instead, are concentrated in areas where genetics will have the greatest influence. Advances in genetic science will revolutionize our approach to patients and to health care in the prevention, diagnosis, and treatment of disease, raising many issues for nursing research and practice. As the scope of genetics expands to encompass multifactorial disease processes, a continuing reexamination of the knowledge base is required for nursing practice, with incorporation of genetic knowledge into the repertoire of every nurse, and with advanced knowledge for nurses who select specialty roles in the genetics area. This article explores the impact of this revolution on nursing science and practice as well as the opportunities for nursing science and practice to participate fully in this revolution. Because of the high proportion of the population at risk for complex diseases and because nurses are occupied every day in the prevention, assessment, treatment, and therapeutic intervention of patients with such diseases in practice and research, there is great opportunity for nurses to improve health care through the application (nursing practice) and discovery (nursing research) of genetic knowledge.

  11. Medical data mining: knowledge discovery in a clinical data warehouse.

    PubMed Central

    Prather, J. C.; Lobach, D. F.; Goodwin, L. K.; Hales, J. W.; Hage, M. L.; Hammond, W. E.

    1997-01-01

    Clinical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within this data could provide new medical knowledge. Unfortunately, few methodologies have been developed and applied to discover this hidden knowledge. In this study, the techniques of data mining (also known as Knowledge Discovery in Databases) were used to search for relationships in a large clinical database. Specifically, data accumulated on 3,902 obstetrical patients were evaluated for factors potentially contributing to preterm birth using exploratory factor analysis. Three factors were identified by the investigators for further exploration. This paper describes the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. PMID:9357597

  12. Discovery informatics in biological and biomedical sciences: research challenges and opportunities.

    PubMed

    Honavar, Vasant

    2015-01-01

    New discoveries in biological, biomedical and health sciences are increasingly being driven by our ability to acquire, share, integrate and analyze, and construct and simulate predictive models of biological systems. While much attention has focused on automating routine aspects of management and analysis of "big data", realizing the full potential of "big data" to accelerate discovery calls for automating many other aspects of the scientific process that have so far largely resisted automation: identifying gaps in the current state of knowledge; generating and prioritizing questions; designing studies; designing, prioritizing, planning, and executing experiments; interpreting results; forming hypotheses; drawing conclusions; replicating studies; validating claims; documenting studies; communicating results; reviewing results; and integrating results into the larger body of knowledge in a discipline. Against this background, the PSB workshop on Discovery Informatics in Biological and Biomedical Sciences explores the opportunities and challenges of automating discovery or assisting humans in discovery through advances (i) Understanding, formalization, and information processing accounts of, the entire scientific process; (ii) Design, development, and evaluation of the computational artifacts (representations, processes) that embody such understanding; and (iii) Application of the resulting artifacts and systems to advance science (by augmenting individual or collective human efforts, or by fully automating science).

  13. Cooperative knowledge evolution: a construction-integration approach to knowledge discovery in medicine.

    PubMed

    Schmalhofer, F J; Tschaitschian, B

    1998-11-01

    In this paper, we perform a cognitive analysis of knowledge discovery processes. As a result of this analysis, the construction-integration theory is proposed as a general framework for developing cooperative knowledge evolution systems. We thus suggest that for the acquisition of new domain knowledge in medicine, one should first construct pluralistic views on a given topic which may contain inconsistencies as well as redundancies. Only thereafter does this knowledge become consolidated into a situation-specific circumscription and the early inconsistencies become eliminated. As a proof for the viability of such knowledge acquisition processes in medicine, we present the IDEAS system, which can be used for the intelligent documentation of adverse events in clinical studies. This system provides a better documentation of the side-effects of medical drugs. Thereby, knowledge evolution occurs by achieving consistent explanations in increasingly larger contexts (i.e., more cases and more pharmaceutical substrates). Finally, it is shown how prototypes, model-based approaches and cooperative knowledge evolution systems can be distinguished as different classes of knowledge-based systems.

  14. Towards a Semantic Web of Things: A Hybrid Semantic Annotation, Extraction, and Reasoning Framework for Cyber-Physical System

    PubMed Central

    Wu, Zhenyu; Xu, Yuan; Yang, Yunong; Zhang, Chunhong; Zhu, Xinning; Ji, Yang

    2017-01-01

    Web of Things (WoT) facilitates the discovery and interoperability of Internet of Things (IoT) devices in a cyber-physical system (CPS). Moreover, a uniform knowledge representation of physical resources is quite necessary for further composition, collaboration, and decision-making process in CPS. Though several efforts have integrated semantics with WoT, such as knowledge engineering methods based on semantic sensor networks (SSN), it still could not represent the complex relationships between devices when dynamic composition and collaboration occur, and it totally depends on manual construction of a knowledge base with low scalability. In this paper, to addresses these limitations, we propose the semantic Web of Things (SWoT) framework for CPS (SWoT4CPS). SWoT4CPS provides a hybrid solution with both ontological engineering methods by extending SSN and machine learning methods based on an entity linking (EL) model. To testify to the feasibility and performance, we demonstrate the framework by implementing a temperature anomaly diagnosis and automatic control use case in a building automation system. Evaluation results on the EL method show that linking domain knowledge to DBpedia has a relative high accuracy and the time complexity is at a tolerant level. Advantages and disadvantages of SWoT4CPS with future work are also discussed. PMID:28230725

  15. Building Better Decision-Support by Using Knowledge Discovery.

    ERIC Educational Resources Information Center

    Jurisica, Igor

    2000-01-01

    Discusses knowledge-based decision-support systems that use artificial intelligence approaches. Addresses the issue of how to create an effective case-based reasoning system for complex and evolving domains, focusing on automated methods for system optimization and domain knowledge evolution that can supplement knowledge acquired from domain…

  16. Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

    PubMed Central

    Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran

    2016-01-01

    Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529

  17. What Tumor Dynamics Modeling Can Teach us About Exploiting the Stem-Cell View for Better Cancer Treatment

    PubMed Central

    Day, Roger S

    2015-01-01

    The cancer stem cell hypothesis is that in human solid cancers, only a small proportion of the cells, the cancer stem cells (CSCs), are self-renewing; the vast majority of the cancer cells are unable to sustain tumor growth indefinitely on their own. In recent years, discoveries have led to the concentration, if not isolation, of putative CSCs. The evidence has mounted that CSCs do exist and are important. This knowledge may promote better understanding of treatment resistance, create opportunities to test agents against CSCs, and open up promise for a fresh approach to cancer treatment. The first clinical trials of new anti-CSC agents are completed, and many others follow. Excitement is mounting that this knowledge will lead to major improvements, even breakthroughs, in treating cancer. However, exploitation of this phenomenon may be more successful if informed by insights into the population dynamics of tumor development. We revive some ideas in tumor dynamics modeling to extract some guidance in designing anti-CSC treatment regimens and the clinical trials that test them. PMID:25780337

  18. Space technology in the discovery and development of mineral and energy resources

    NASA Technical Reports Server (NTRS)

    Lowman, P. D.

    1977-01-01

    Space technology, applied to the discovery and extraction of mineral and energy resources, is summarized. Orbital remote sensing for geological purposes has been widely applied through the use of LANDSAT satellites. These techniques also have been of value for protection against environmental hazards and for a better understanding of crustal structure.

  19. High content live cell imaging for the discovery of new antimalarial marine natural products

    PubMed Central

    2012-01-01

    Background The human malaria parasite remains a burden in developing nations. It is responsible for up to one million deaths a year, a number that could rise due to increasing multi-drug resistance to all antimalarial drugs currently available. Therefore, there is an urgent need for the discovery of new drug therapies. Recently, our laboratory developed a simple one-step fluorescence-based live cell-imaging assay to integrate the complex biology of the human malaria parasite into drug discovery. Here we used our newly developed live cell-imaging platform to discover novel marine natural products and their cellular phenotypic effects against the most lethal malaria parasite, Plasmodium falciparum. Methods A high content live cell imaging platform was used to screen marine extracts effects on malaria. Parasites were grown in vitro in the presence of extracts, stained with RNA sensitive dye, and imaged at timed intervals with the BD Pathway HT automated confocal microscope. Results Image analysis validated our new methodology at a larger scale level and revealed potential antimalarial activity of selected extracts with a minimal cytotoxic effect on host red blood cells. To further validate our assay, we investigated parasite's phenotypes when incubated with the purified bioactive natural product bromophycolide A. We show that bromophycolide A has a strong and specific morphological effect on parasites, similar to the ones observed from the initial extracts. Conclusion Collectively, our results show that high-content live cell-imaging (HCLCI) can be used to screen chemical libraries and identify parasite specific inhibitors with limited host cytotoxic effects. All together we provide new leads for the discovery of novel antimalarials. PMID:22214291

  20. High content live cell imaging for the discovery of new antimalarial marine natural products.

    PubMed

    Cervantes, Serena; Stout, Paige E; Prudhomme, Jacques; Engel, Sebastian; Bruton, Matthew; Cervantes, Michael; Carter, David; Tae-Chang, Young; Hay, Mark E; Aalbersberg, William; Kubanek, Julia; Le Roch, Karine G

    2012-01-03

    The human malaria parasite remains a burden in developing nations. It is responsible for up to one million deaths a year, a number that could rise due to increasing multi-drug resistance to all antimalarial drugs currently available. Therefore, there is an urgent need for the discovery of new drug therapies. Recently, our laboratory developed a simple one-step fluorescence-based live cell-imaging assay to integrate the complex biology of the human malaria parasite into drug discovery. Here we used our newly developed live cell-imaging platform to discover novel marine natural products and their cellular phenotypic effects against the most lethal malaria parasite, Plasmodium falciparum. A high content live cell imaging platform was used to screen marine extracts effects on malaria. Parasites were grown in vitro in the presence of extracts, stained with RNA sensitive dye, and imaged at timed intervals with the BD Pathway HT automated confocal microscope. Image analysis validated our new methodology at a larger scale level and revealed potential antimalarial activity of selected extracts with a minimal cytotoxic effect on host red blood cells. To further validate our assay, we investigated parasite's phenotypes when incubated with the purified bioactive natural product bromophycolide A. We show that bromophycolide A has a strong and specific morphological effect on parasites, similar to the ones observed from the initial extracts. Collectively, our results show that high-content live cell-imaging (HCLCI) can be used to screen chemical libraries and identify parasite specific inhibitors with limited host cytotoxic effects. All together we provide new leads for the discovery of novel antimalarials. © 2011 Cervantes et al; licensee BioMed Central Ltd.

  1. A novel association rule mining approach using TID intermediate itemset.

    PubMed

    Aqra, Iyad; Herawan, Tutut; Abdul Ghani, Norjihan; Akhunzada, Adnan; Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

  2. A novel association rule mining approach using TID intermediate itemset

    PubMed Central

    Ali, Akhtar; Bin Razali, Ramdan; Ilahi, Manzoor; Raymond Choo, Kim-Kwang

    2018-01-01

    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets. PMID:29351287

  3. Project X: competitive intelligence data mining and analysis

    NASA Astrophysics Data System (ADS)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Competitive Intelligence (CI) is a systematic and ethical program for gathering and analyzing information about your competitors' activities and general business trends to further your own company's goals. CI allows companies to gather extensive information on their competitors and to analyze what the competition is doing in order to maintain or gain a competitive edge. In commercial business this potentially translates into millions of dollars in annual savings or losses. The Internet provides an overwhelming portal of information for CI analysis. The problem is how a company can automate the translation of voluminous information into valuable and actionable knowledge. This paper describes Project X, an agent-based data mining system specifically developed for extracting and analyzing competitive information from the Internet. Project X gathers CI information from a variety of sources including online newspapers, corporate websites, industry sector reporting sites, speech archiving sites, video news casts, stock news sites, weather sites, and rumor sites. It uses individual industry specific (e.g., pharmaceutical, financial, aerospace, etc.) commercial sector ontologies to form the knowledge filtering and discovery structures/content required to filter and identify valuable competitive knowledge. Project X is described in detail and an example competitive intelligence case is shown demonstrating the system's performance and utility for business intelligence.

  4. Biological network extraction from scientific literature: state of the art and challenges.

    PubMed

    Li, Chen; Liakata, Maria; Rebholz-Schuhmann, Dietrich

    2014-09-01

    Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. VAiRoma: A Visual Analytics System for Making Sense of Places, Times, and Events in Roman History.

    PubMed

    Cho, Isaac; Dou, Wewnen; Wang, Derek Xiaoyu; Sauda, Eric; Ribarsky, William

    2016-01-01

    Learning and gaining knowledge of Roman history is an area of interest for students and citizens at large. This is an example of a subject with great sweep (with many interrelated sub-topics over, in this case, a 3,000 year history) that is hard to grasp by any individual and, in its full detail, is not available as a coherent story. In this paper, we propose a visual analytics approach to construct a data driven view of Roman history based on a large collection of Wikipedia articles. Extracting and enabling the discovery of useful knowledge on events, places, times, and their connections from large amounts of textual data has always been a challenging task. To this aim, we introduce VAiRoma, a visual analytics system that couples state-of-the-art text analysis methods with an intuitive visual interface to help users make sense of events, places, times, and more importantly, the relationships between them. VAiRoma goes beyond textual content exploration, as it permits users to compare, make connections, and externalize the findings all within the visual interface. As a result, VAiRoma allows users to learn and create new knowledge regarding Roman history in an informed way. We evaluated VAiRoma with 16 participants through a user study, with the task being to learn about roman piazzas through finding relevant articles and new relationships. Our study results showed that the VAiRoma system enables the participants to find more relevant articles and connections compared to Web searches and literature search conducted in a roman library. Subjective feedback on VAiRoma was also very positive. In addition, we ran two case studies that demonstrate how VAiRoma can be used for deeper analysis, permitting the rapid discovery and analysis of a small number of key documents even when the original collection contains hundreds of thousands of documents.

  6. To ontologise or not to ontologise: An information model for a geospatial knowledge infrastructure

    NASA Astrophysics Data System (ADS)

    Stock, Kristin; Stojanovic, Tim; Reitsma, Femke; Ou, Yang; Bishr, Mohamed; Ortmann, Jens; Robertson, Anne

    2012-08-01

    A geospatial knowledge infrastructure consists of a set of interoperable components, including software, information, hardware, procedures and standards, that work together to support advanced discovery and creation of geoscientific resources, including publications, data sets and web services. The focus of the work presented is the development of such an infrastructure for resource discovery. Advanced resource discovery is intended to support scientists in finding resources that meet their needs, and focuses on representing the semantic details of the scientific resources, including the detailed aspects of the science that led to the resource being created. This paper describes an information model for a geospatial knowledge infrastructure that uses ontologies to represent these semantic details, including knowledge about domain concepts, the scientific elements of the resource (analysis methods, theories and scientific processes) and web services. This semantic information can be used to enable more intelligent search over scientific resources, and to support new ways to infer and visualise scientific knowledge. The work describes the requirements for semantic support of a knowledge infrastructure, and analyses the different options for information storage based on the twin goals of semantic richness and syntactic interoperability to allow communication between different infrastructures. Such interoperability is achieved by the use of open standards, and the architecture of the knowledge infrastructure adopts such standards, particularly from the geospatial community. The paper then describes an information model that uses a range of different types of ontologies, explaining those ontologies and their content. The information model was successfully implemented in a working geospatial knowledge infrastructure, but the evaluation identified some issues in creating the ontologies.

  7. HSQC-TOCSY Fingerprinting for Prioritization of Polyketide- and Peptide-Producing Microbial Isolates.

    PubMed

    Buedenbender, Larissa; Habener, Leesa J; Grkovic, Tanja; Kurtböke, D İpek; Duffy, Sandra; Avery, Vicky M; Carroll, Anthony R

    2018-04-27

    Microbial products are a promising source for drug leads as a result of their unique structural diversity. However, reisolation of already known natural products significantly hampers the discovery process, and it is therefore important to incorporate effective microbial isolate selection and dereplication protocols early in microbial natural product studies. We have developed a systematic approach for prioritization of microbial isolates for natural product discovery based on heteronuclear single-quantum correlation-total correlation spectroscopy (HSQC-TOCSY) nuclear magnetic resonance profiles in combination with antiplasmodial activity of extracts. The HSQC-TOCSY experiments allowed for unfractionated microbial extracts containing polyketide and peptidic natural products to be rapidly identified. Here, we highlight how this approach was used to prioritize extracts derived from a library of 119 ascidian-associated actinomycetes that possess a higher potential to produce bioactive polyketides and peptides.

  8. Liquid-liquid extraction of actinides, lanthanides, and fission products by use of ionic liquids: from discovery to understanding.

    PubMed

    Billard, Isabelle; Ouadi, Ali; Gaillard, Clotilde

    2011-06-01

    Liquid-liquid extraction of actinides and lanthanides by use of ionic liquids is reviewed, considering, first, phenomenological aspects, then looking more deeply at the various mechanisms. Future trends in this developing field are presented.

  9. Discovery of New Compounds Active against Plasmodium falciparum by High Throughput Screening of Microbial Natural Products.

    PubMed

    Pérez-Moreno, Guiomar; Cantizani, Juan; Sánchez-Carrasco, Paula; Ruiz-Pérez, Luis Miguel; Martín, Jesús; El Aouad, Noureddine; Pérez-Victoria, Ignacio; Tormo, José Rubén; González-Menendez, Víctor; González, Ignacio; de Pedro, Nuria; Reyes, Fernando; Genilloud, Olga; Vicente, Francisca; González-Pacanowska, Dolores

    2016-01-01

    Due to the low structural diversity within the set of antimalarial drugs currently available in the clinic and the increasing number of cases of resistance, there is an urgent need to find new compounds with novel modes of action to treat the disease. Microbial natural products are characterized by their large diversity provided in terms of the chemical complexity of the compounds and the novelty of structures. Microbial natural products extracts have been underexplored in the search for new antiparasitic drugs and even more so in the discovery of new antimalarials. Our objective was to find new druggable natural products with antimalarial properties from the MEDINA natural products collection, one of the largest natural product libraries harboring more than 130,000 microbial extracts. In this work, we describe the optimization process and the results of a phenotypic high throughput screen (HTS) based on measurements of Plasmodium lactate dehydrogenase. A subset of more than 20,000 extracts from the MEDINA microbial products collection has been explored, leading to the discovery of 3 new compounds with antimalarial activity. In addition, we report on the novel antiplasmodial activity of 4 previously described natural products.

  10. Inhibition of Swarming motility of Pseudomonas aeruginosa by Methanol extracts of Alpinia officinarum Hance. and Cinnamomum tamala T. Nees and Eberm.

    PubMed

    Lakshmanan, Divya; Nanda, Jishudas; Jeevaratnam, K

    2018-06-01

    Bacterial drug resistance is a challenge in clinical settings, especially in countries like India. Hence, discovery of novel alternative therapeutics has become a necessity in the fight against drug resistance. Compounds that inhibit bacterial virulence properties form new therapeutic alternatives. Pseudomonas aeruginosa is an opportunistic, nosocomial pathogen that infects immune-compromised patients. Swarming motility is an important virulence property of Pseudomonas which aids it in reaching host cells under nutrient limiting conditions. Here, we report the screening of five plant extracts against swarming motility of P. aeruginosa and show that methanol extracts of Alpinia officinarum and Cinnamomum tamala inhibit swarming motility at 5 μg mL -1 without inhibiting its growth. These extracts did not inhibit swimming and twitching motilities indicating a mode of action specific to swarming pathway. Preliminary experiments indicated that rhamnolipid production was not affected. This study reveals the potential of the two plants in anti-virulence drug discovery.

  11. Toward a Unified Theory of Visual Area V4

    PubMed Central

    Roe, Anna W.; Chelazzi, Leonardo; Connor, Charles E.; Conway, Bevil R.; Fujita, Ichiro; Gallant, Jack L.; Lu, Haidong; Vanduffel, Wim

    2016-01-01

    Visual area V4 is a midtier cortical area in the ventral visual pathway. It is crucial for visual object recognition and has been a focus of many studies on visual attention. However, there is no unifying view of V4’s role in visual processing. Neither is there an understanding of how its role in feature processing interfaces with its role in visual attention. This review captures our current knowledge of V4, largely derived from electrophysiological and imaging studies in the macaque monkey. Based on recent discovery of functionally specific domains in V4, we propose that the unifying function of V4 circuitry is to enable selective extraction of specific functional domain-based networks, whether it be by bottom-up specification of object features or by top-down attentionally driven selection. PMID:22500626

  12. Wains: a pattern-seeking artificial life species.

    PubMed

    de Buitléir, Amy; Russell, Michael; Daly, Mark

    2012-01-01

    We describe the initial phase of a research project to develop an artificial life framework designed to extract knowledge from large data sets with minimal preparation or ramp-up time. In this phase, we evolved an artificial life population with a new brain architecture. The agents have sufficient intelligence to discover patterns in data and to make survival decisions based on those patterns. The species uses diploid reproduction, Hebbian learning, and Kohonen self-organizing maps, in combination with novel techniques such as using pattern-rich data as the environment and framing the data analysis as a survival problem for artificial life. The first generation of agents mastered the pattern discovery task well enough to thrive. Evolution further adapted the agents to their environment by making them a little more pessimistic, and also by making their brains more efficient.

  13. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes.

    PubMed

    Hassani-Pak, Keywan; Rawlings, Christopher

    2017-06-13

    Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.

  14. Great Originals of Modern Physics

    ERIC Educational Resources Information Center

    Decker, Fred W.

    1972-01-01

    European travel can provide an intimate view of the implements and locales of great discoveries in physics for the knowledgeable traveler. The four museums at Cambridge, London, Remscheid-Lennep, and Munich display a full range of discovery apparatus in modern physics as outlined here. (Author/TS)

  15. Dulse on the Distaff Side.

    ERIC Educational Resources Information Center

    MacKenzie, Marion

    1983-01-01

    Scientific research leading to the discovery of female plants of the red alga Palmaria plamata (dulse) is described. This discovery has not only advanced knowledge of marine organisms and taxonomic relationships but also has practical implications. The complete life cycle of this organism is included. (JN)

  16. 43 CFR 4.1132 - Scope of discovery.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...

  17. 43 CFR 4.1132 - Scope of discovery.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...

  18. 43 CFR 4.1132 - Scope of discovery.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...

  19. 43 CFR 4.1132 - Scope of discovery.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...

  20. Scientific Knowledge Discovery in Complex Semantic Networks of Geophysical Systems

    NASA Astrophysics Data System (ADS)

    Fox, P.

    2012-04-01

    The vast majority of explorations of the Earth's systems are limited in their ability to effectively explore the most important (often most difficult) problems because they are forced to interconnect at the data-element, or syntactic, level rather than at a higher scientific, or semantic, level. Recent successes in the application of complex network theory and algorithms to climate data, raise expectations that more general graph-based approaches offer the opportunity for new discoveries. In the past ~ 5 years in the natural sciences there has substantial progress in providing both specialists and non-specialists the ability to describe in machine readable form, geophysical quantities and relations among them in meaningful and natural ways, effectively breaking the prior syntax barrier. The corresponding open-world semantics and reasoning provide higher-level interconnections. That is, semantics provided around the data structures, using semantically-equipped tools, and semantically aware interfaces between science application components allowing for discovery at the knowledge level. More recently, formal semantic approaches to continuous and aggregate physical processes are beginning to show promise and are soon likely to be ready to apply to geoscientific systems. To illustrate these opportunities, this presentation presents two application examples featuring domain vocabulary (ontology) and property relations (named and typed edges in the graphs). First, a climate knowledge discovery pilot encoding and exploration of CMIP5 catalog information with the eventual goal to encode and explore CMIP5 data. Second, a multi-stakeholder knowledge network for integrated assessments in marine ecosystems, where the data is highly inter-disciplinary.

  1. Abstracting Attribute Space for Transfer Function Exploration and Design.

    PubMed

    Maciejewski, Ross; Jang, Yun; Woo, Insoo; Jänicke, Heike; Gaither, Kelly P; Ebert, David S

    2013-01-01

    Currently, user centered transfer function design begins with the user interacting with a one or two-dimensional histogram of the volumetric attribute space. The attribute space is visualized as a function of the number of voxels, allowing the user to explore the data in terms of the attribute size/magnitude. However, such visualizations provide the user with no information on the relationship between various attribute spaces (e.g., density, temperature, pressure, x, y, z) within the multivariate data. In this work, we propose a modification to the attribute space visualization in which the user is no longer presented with the magnitude of the attribute; instead, the user is presented with an information metric detailing the relationship between attributes of the multivariate volumetric data. In this way, the user can guide their exploration based on the relationship between the attribute magnitude and user selected attribute information as opposed to being constrained by only visualizing the magnitude of the attribute. We refer to this modification to the traditional histogram widget as an abstract attribute space representation. Our system utilizes common one and two-dimensional histogram widgets where the bins of the abstract attribute space now correspond to an attribute relationship in terms of the mean, standard deviation, entropy, or skewness. In this manner, we exploit the relationships and correlations present in the underlying data with respect to the dimension(s) under examination. These relationships are often times key to insight and allow us to guide attribute discovery as opposed to automatic extraction schemes which try to calculate and extract distinct attributes a priori. In this way, our system aids in the knowledge discovery of the interaction of properties within volumetric data.

  2. Beginning to manage drug discovery and development knowledge.

    PubMed

    Sumner-Smith, M

    2001-05-01

    Knowledge management approaches and technologies are beginning to be implemented by the pharmaceutical industry in support of new drug discovery and development processes aimed at greater efficiencies and effectiveness. This trend coincides with moves to reduce paper, coordinate larger teams with more diverse skills that are distributed around the globe, and to comply with regulatory requirements for electronic submissions and the associated maintenance of electronic records. Concurrently, the available technologies have implemented web-based architectures with a greater range of collaborative tools and personalization through portal approaches. However, successful application of knowledge management methods depends on effective cultural change management, as well as proper architectural design to match the organizational and work processes within a company.

  3. Revisiting History: Encountering Iodine Then and Now--A General Chemistry Laboratory to Observe Iodine from Seaweed

    ERIC Educational Resources Information Center

    Wahab, M. Farooq

    2009-01-01

    The history of the discovery of iodine is retold using brown-colored seaweed found commonly along the ocean shore. The seaweed is ashed at a low temperature and the iodides are extracted into boiling water. The iodides are oxidized in acidic medium. Solvent extraction of iodine by oxidation of iodides as well as simple aqueous extraction of iodide…

  4. Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

    PubMed Central

    Solovyev, Valery; Ivanov, Vladimir

    2016-01-01

    Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian. PMID:26955386

  5. From Information Center to Discovery System: Next Step for Libraries?

    ERIC Educational Resources Information Center

    Marcum, James W.

    2001-01-01

    Proposes a discovery system model to guide technology integration in academic libraries that fuses organizational learning, systems learning, and knowledge creation techniques with constructivist learning practices to suggest possible future directions for digital libraries. Topics include accessing visual and continuous media; information…

  6. Foreword to "The Secret of Childhood."

    ERIC Educational Resources Information Center

    Stephenson, Margaret E.

    2000-01-01

    Discusses the basic discoveries of Montessori's Casa dei Bambini. Considers principles of Montessori's organizing theory: the absorbent mind, the unfolding nature of life, the spiritual embryo, self-construction, acquisition of culture, creativity of life, repetition of exercise, freedom within limits, children's discovery of knowledge, the secret…

  7. Cosmic Discovery

    NASA Astrophysics Data System (ADS)

    Harwit, Martin

    1984-04-01

    In the remarkable opening section of this book, a well-known Cornell astronomer gives precise thumbnail histories of the 43 basic cosmic discoveries - stars, planets, novae, pulsars, comets, gamma-ray bursts, and the like - that form the core of our knowledge of the universe. Many of them, he points out, were made accidentally and outside the mainstream of astronomical research and funding. This observation leads him to speculate on how many more major phenomena there might be and how they might be most effectively sought out in afield now dominated by large instruments and complex investigative modes and observational conditions. The book also examines discovery in terms of its political, financial, and sociological context - the role of new technologies and of industry and the military in revealing new knowledge; and methods of funding, of peer review, and of allotting time on our largest telescopes. It concludes with specific recommendations for organizing astronomy in ways that will best lead to the discovery of the many - at least sixty - phenomena that Harwit estimates are still waiting to be found.

  8. The discovery of medicines for rare diseases

    PubMed Central

    Swinney, David C; Xia, Shuangluo

    2015-01-01

    There is a pressing need for new medicines (new molecular entities; NMEs) for rare diseases as few of the 6800 rare diseases (according to the NIH) have approved treatments. Drug discovery strategies for the 102 orphan NMEs approved by the US FDA between 1999 and 2012 were analyzed to learn from past success: 46 NMEs were first in class; 51 were followers; and five were imaging agents. First-in-class medicines were discovered with phenotypic assays (15), target-based approaches (12) and biologic strategies (18). Identification of genetic causes in areas with more basic and translational research such as cancer and in-born errors in metabolism contributed to success regardless of discovery strategy. In conclusion, greater knowledge increases the chance of success and empirical solutions can be effective when knowledge is incomplete. PMID:25068983

  9. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery

    PubMed Central

    2014-01-01

    The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org. PMID:24602174

  10. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

    PubMed Central

    Roos, Marco; Marshall, M Scott; Gibson, Andrew P; Schuemie, Martijn; Meij, Edgar; Katrenko, Sophia; van Hage, Willem Robert; Krommydas, Konstantinos; Adriaans, Pieter W

    2009-01-01

    Background Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation. PMID:19796406

  11. African Flora Has the Potential to Fight Multidrug Resistance of Cancer

    PubMed Central

    Kuete, Victor; Efferth, Thomas

    2015-01-01

    Background. Continuous efforts from scientists of diverse fields are necessary not only to better understand the mechanism by which multidrug-resistant (MDR) cancer cells occur, but also to boost the discovery of new cytotoxic compounds to fight MDR phenotypes. Objectives. The present review reports on the contribution of African flora in the discovery of potential cytotoxic phytochemicals against MDR cancer cells. Methodology. Scientific databases such as PubMed, ScienceDirect, Scopus, Google Scholar, and Web of Knowledge were used to retrieve publications related to African plants, isolated compounds, and drug resistant cancer cells. The data were analyzed to highlight cytotoxicity and the modes of actions of extracts and compounds of the most prominent African plants. Also, thresholds and cutoff points for the cytotoxicity and modes of action of phytochemicals have been provided. Results. Most published data related to the antiproliferative potential of African medicinal plants were from Cameroon, Egypt, Nigeria, or Madagascar. The cytotoxicity of phenolic compounds isolated in African plants was generally much better documented than that of terpenoids and alkaloids. Conclusion. African flora represents an enormous resource for novel cytotoxic compounds. To unravel the full potential, efforts should be strengthened throughout the continent, to meet the challenge of a successful fight against MDR cancers. PMID:25961047

  12. Knowledge discovery from structured mammography reports using inductive logic programming.

    PubMed

    Burnside, Elizabeth S; Davis, Jesse; Costa, Victor Santos; Dutra, Inês de Castro; Kahn, Charles E; Fine, Jason; Page, David

    2005-01-01

    The development of large mammography databases provides an opportunity for knowledge discovery and data mining techniques to recognize patterns not previously appreciated. Using a database from a breast imaging practice containing patient risk factors, imaging findings, and biopsy results, we tested whether inductive logic programming (ILP) could discover interesting hypotheses that could subsequently be tested and validated. The ILP algorithm discovered two hypotheses from the data that were 1) judged as interesting by a subspecialty trained mammographer and 2) validated by analysis of the data itself.

  13. A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

    PubMed

    Kothari, Cartik R; Payne, Philip R O

    2015-01-01

    In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

  14. ASIS 2000: Knowledge Innovations: Celebrating Our Heritage, Designing Our Future. Proceedings of the ASIS Annual Meeting (63rd, Chicago, Illinois, November 12-16, 2000). Volume 37.

    ERIC Educational Resources Information Center

    Kraft, Donald H., Ed.

    The 2000 ASIS (American Society for Information Science) conference explored knowledge innovation. The tracks in the conference program included knowledge discovery, capture, and creation; classification and representation; information retrieval; knowledge dissemination; and social, behavioral, ethical, and legal aspects. This proceedings is…

  15. Evaluating the Science of Discovery in Complex Health Systems

    ERIC Educational Resources Information Center

    Norman, Cameron D.; Best, Allan; Mortimer, Sharon; Huerta, Timothy; Buchan, Alison

    2011-01-01

    Complex health problems such as chronic disease or pandemics require knowledge that transcends disciplinary boundaries to generate solutions. Such transdisciplinary discovery requires researchers to work and collaborate across boundaries, combining elements of basic and applied science. At the same time, calls for more interdisciplinary health…

  16. 29 CFR 18.14 - Scope of discovery.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...

  17. 49 CFR 386.38 - Scope of discovery.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...

  18. 49 CFR 386.38 - Scope of discovery.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...

  19. 29 CFR 18.14 - Scope of discovery.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...

  20. 49 CFR 386.38 - Scope of discovery.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...

  1. 29 CFR 18.14 - Scope of discovery.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...

  2. 29 CFR 18.14 - Scope of discovery.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...

  3. 49 CFR 386.38 - Scope of discovery.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...

  4. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery.

    PubMed

    Dodd, Susanna; Clarke, Mike; Becker, Lorne; Mavergames, Chris; Fish, Rebecca; Williamson, Paula R

    2018-04-01

    There is increasing recognition that insufficient attention has been paid to the choice of outcomes measured in clinical trials. The lack of a standardized outcome classification system results in inconsistencies due to ambiguity and variation in how outcomes are described across different studies. Being able to classify by outcome would increase efficiency in searching sources such as clinical trial registries, patient registries, the Cochrane Database of Systematic Reviews, and the Core Outcome Measures in Effectiveness Trials (COMET) database of core outcome sets (COS), thus aiding knowledge discovery. A literature review was carried out to determine existing outcome classification systems, none of which were sufficiently comprehensive or granular for classification of all potential outcomes from clinical trials. A new taxonomy for outcome classification was developed, and as proof of principle, outcomes extracted from all published COS in the COMET database, selected Cochrane reviews, and clinical trial registry entries were classified using this new system. Application of this new taxonomy to COS in the COMET database revealed that 274/299 (92%) COS include at least one physiological outcome, whereas only 177 (59%) include at least one measure of impact (global quality of life or some measure of functioning) and only 105 (35%) made reference to adverse events. This outcome taxonomy will be used to annotate outcomes included in COS within the COMET database and is currently being piloted for use in Cochrane Reviews within the Cochrane Linked Data Project. Wider implementation of this standard taxonomy in trial and systematic review databases and registries will further promote efficient searching, reporting, and classification of trial outcomes. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Trends in Modern Drug Discovery.

    PubMed

    Eder, Jörg; Herrling, Paul L

    2016-01-01

    Drugs discovered by the pharmaceutical industry over the past 100 years have dramatically changed the practice of medicine and impacted on many aspects of our culture. For many years, drug discovery was a target- and mechanism-agnostic approach that was based on ethnobotanical knowledge often fueled by serendipity. With the advent of modern molecular biology methods and based on knowledge of the human genome, drug discovery has now largely changed into a hypothesis-driven target-based approach, a development which was paralleled by significant environmental changes in the pharmaceutical industry. Laboratories became increasingly computerized and automated, and geographically dispersed research sites are now more and more clustered into large centers to capture technological and biological synergies. Today, academia, the regulatory agencies, and the pharmaceutical industry all contribute to drug discovery, and, in order to translate the basic science into new medical treatments for unmet medical needs, pharmaceutical companies have to have a critical mass of excellent scientists working in many therapeutic fields, disciplines, and technologies. The imperative for the pharmaceutical industry to discover breakthrough medicines is matched by the increasing numbers of first-in-class drugs approved in recent years and reflects the impact of modern drug discovery approaches, technologies, and genomics.

  6. The extraction of drug-disease correlations based on module distance in incomplete human interactome.

    PubMed

    Yu, Liang; Wang, Bingbo; Ma, Xiaoke; Gao, Lin

    2016-12-23

    Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.

  7. Reuniting Virtue and Knowledge

    ERIC Educational Resources Information Center

    Culham, Tom

    2015-01-01

    Einstein held that intuition is more important than rational inquiry as a source of discovery. Further, he explicitly and implicitly linked the heart, the sacred, devotion and intuitive knowledge. The raison d'être of universities is the advance of knowledge; however, they have primarily focused on developing student's skills in working with…

  8. FEX: A Knowledge-Based System For Planimetric Feature Extraction

    NASA Astrophysics Data System (ADS)

    Zelek, John S.

    1988-10-01

    Topographical planimetric features include natural surfaces (rivers, lakes) and man-made surfaces (roads, railways, bridges). In conventional planimetric feature extraction, a photointerpreter manually interprets and extracts features from imagery on a stereoplotter. Visual planimetric feature extraction is a very labour intensive operation. The advantages of automating feature extraction include: time and labour savings; accuracy improvements; and planimetric data consistency. FEX (Feature EXtraction) combines techniques from image processing, remote sensing and artificial intelligence for automatic feature extraction. The feature extraction process co-ordinates the information and knowledge in a hierarchical data structure. The system simulates the reasoning of a photointerpreter in determining the planimetric features. Present efforts have concentrated on the extraction of road-like features in SPOT imagery. Keywords: Remote Sensing, Artificial Intelligence (AI), SPOT, image understanding, knowledge base, apars.

  9. Integrative Systems Biology for Data Driven Knowledge Discovery

    PubMed Central

    Greene, Casey S.; Troyanskaya, Olga G.

    2015-01-01

    Integrative systems biology is an approach that brings together diverse high throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost effective manner. Using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of yet unknown components of a system of interest and how its malfunction leads to disease. PMID:21044756

  10. 18 CFR 385.402 - Scope of discovery (Rule 402).

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Scope of discovery (Rule 402). 385.402 Section 385.402 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... persons having any knowledge of any discoverable matter. It is not ground for objection that the...

  11. Doors to Discovery[TM]. What Works Clearinghouse Intervention Report

    ERIC Educational Resources Information Center

    What Works Clearinghouse, 2013

    2013-01-01

    "Doors to Discovery"]TM] is a preschool literacy curriculum that uses eight thematic units of activities to help children build fundamental early literacy skills in oral language, phonological awareness, concepts of print, alphabet knowledge, writing, and comprehension. The eight thematic units cover topics such as nature, friendship,…

  12. 78 FR 12933 - Proceedings Before the Commodity Futures Trading Commission

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-26

    ... proceedings. These new amendments also provide that Judgment Officers may conduct sua sponte discovery in... discovery; (4) sound risk management practices; and (5) other public interest considerations. The amendments... representative capacity, it was done with full power and authority to do so; (C) To the best of his knowledge...

  13. 76 FR 64803 - Rules of Adjudication and Enforcement

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-19

    ...) is also amended to clarify the limits on discovery when the Commission orders the ALJ to consider the... that the complainant identify, to the best of its knowledge, the ``like or directly competitive... the taking of discovery by the parties shall be at the discretion of the presiding ALJ. The ITCTLA...

  14. 78 FR 63253 - Davidson Kempner Capital Management LLC; Notice of Application

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-23

    ... employees of the Adviser other than the Contributor have any knowledge of the Contribution prior to its discovery by the Adviser on November 2, 2011. The Contribution was discovered by the Adviser's compliance... names of employees. After discovery of the Contribution, the Adviser and Contributor obtained the...

  15. Concurrent profiling of polar metabolites and lipids in human plasma using HILIC-FTMS

    NASA Astrophysics Data System (ADS)

    Cai, Xiaoming; Li, Ruibin

    2016-11-01

    Blood plasma is the most popularly used sample matrix for metabolite profiling studies, which aim to achieve global metabolite profiling and biomarker discovery. However, most of the current studies on plasma metabolite profiling focused on either the polar metabolites or lipids. In this study, a comprehensive analysis approach based on HILIC-FTMS was developed to concurrently examine polar metabolites and lipids. The HILIC-FTMS method was developed using mixed standards of polar metabolites and lipids, the separation efficiency of which is better in HILIC mode than in C5 and C18 reversed phase (RP) chromatography. This method exhibits good reproducibility in retention times (CVs < 3.43%) and high mass accuracy (<3.5 ppm). In addition, we found MeOH/ACN/Acetone (1:1:1, v/v/v) as extraction cocktail could achieve desirable gathering of demanded extracts from plasma samples. We further integrated the MeOH/ACN/Acetone extraction with the HILIC-FTMS method for metabolite profiling and smoking-related biomarker discovery in human plasma samples. Heavy smokers could be successfully distinguished from non smokers by univariate and multivariate statistical analysis of the profiling data, and 62 biomarkers for cigarette smoke were found. These results indicate that our concurrent analysis approach could be potentially used for clinical biomarker discovery, metabolite-based diagnosis, etc.

  16. Network-based approaches to climate knowledge discovery

    NASA Astrophysics Data System (ADS)

    Budich, Reinhard; Nyberg, Per; Weigel, Tobias

    2011-11-01

    Climate Knowledge Discovery Workshop; Hamburg, Germany, 30 March to 1 April 2011 Do complex networks combined with semantic Web technologies offer the next generation of solutions in climate science? To address this question, a first Climate Knowledge Discovery (CKD) Workshop, hosted by the German Climate Computing Center (Deutsches Klimarechenzentrum (DKRZ)), brought together climate and computer scientists from major American and European laboratories, data centers, and universities, as well as representatives from industry, the broader academic community, and the semantic Web communities. The participants, representing six countries, were concerned with large-scale Earth system modeling and computational data analysis. The motivation for the meeting was the growing problem that climate scientists generate data faster than it can be interpreted and the need to prepare for further exponential data increases. Current analysis approaches are focused primarily on traditional methods, which are best suited for large-scale phenomena and coarse-resolution data sets. The workshop focused on the open discussion of ideas and technologies to provide the next generation of solutions to cope with the increasing data volumes in climate science.

  17. Bridging the Gap in Neurotherapeutic Discovery and Development: The Role of the National Institute of Neurological Disorders and Stroke in Translational Neuroscience.

    PubMed

    Mott, Meghan; Koroshetz, Walter

    2015-07-01

    The mission of the National Institute of Neurological Disorders and Stroke (NINDS) is to seek fundamental knowledge about the brain and nervous system and to use that knowledge to reduce the burden of neurological disease. NINDS supports early- and late-stage therapy development funding programs to accelerate preclinical discovery and the development of new therapeutic interventions for neurological disorders. The NINDS Office of Translational Research facilitates and funds the movement of discoveries from the laboratory to patients. Its grantees include academics, often with partnerships with the private sector, as well as small businesses, which, by Congressional mandate, receive > 3% of the NINDS budget for small business innovation research. This article provides an overview of NINDS-funded therapy development programs offered by the NINDS Office of Translational Research.

  18. Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

    PubMed Central

    Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand

    2010-01-01

    The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778

  19. Interdisciplinary Laboratory Course Facilitating Knowledge Integration, Mutualistic Teaming, and Original Discovery.

    PubMed

    Full, Robert J; Dudley, Robert; Koehl, M A R; Libby, Thomas; Schwab, Cheryl

    2015-11-01

    Experiencing the thrill of an original scientific discovery can be transformative to students unsure about becoming a scientist, yet few courses offer authentic research experiences. Increasingly, cutting-edge discoveries require an interdisciplinary approach not offered in current departmental-based courses. Here, we describe a one-semester, learning laboratory course on organismal biomechanics offered at our large research university that enables interdisciplinary teams of students from biology and engineering to grow intellectually, collaborate effectively, and make original discoveries. To attain this goal, we avoid traditional "cookbook" laboratories by training 20 students to use a dozen research stations. Teams of five students rotate to a new station each week where a professor, graduate student, and/or team member assists in the use of equipment, guides students through stages of critical thinking, encourages interdisciplinary collaboration, and moves them toward authentic discovery. Weekly discussion sections that involve the entire class offer exchange of discipline-specific knowledge, advice on experimental design, methods of collecting and analyzing data, a statistics primer, and best practices for writing and presenting scientific papers. The building of skills in concert with weekly guided inquiry facilitates original discovery via a final research project that can be presented at a national meeting or published in a scientific journal. © The Author 2015. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  20. Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development.

    PubMed

    McEntire, Robin; Szalkowski, Debbie; Butler, James; Kuo, Michelle S; Chang, Meiping; Chang, Man; Freeman, Darren; McQuay, Sarah; Patel, Jagruti; McGlashen, Michael; Cornell, Wendy D; Xu, Jinghai James

    2016-05-01

    External content sources such as MEDLINE(®), National Institutes of Health (NIH) grants and conference websites provide access to the latest breaking biomedical information, which can inform pharmaceutical and biotechnology company pipeline decisions. The value of the sites for industry, however, is limited by the use of the public internet, the limited synonyms, the rarity of batch searching capability and the disconnected nature of the sites. Fortunately, many sites now offer their content for download and we have developed an automated internal workflow that uses text mining and tailored ontologies for programmatic search and knowledge extraction. We believe such an efficient and secure approach provides a competitive advantage to companies needing access to the latest information for a range of use cases and complements manually curated commercial sources. Copyright © 2016. Published by Elsevier Ltd.

  1. Independent Research Projects Using Protein Extraction: Affordable Ways to Inquire, Discover & Publish for Undergraduate Students

    ERIC Educational Resources Information Center

    Pu, Rongsun

    2010-01-01

    This article describes how to use protein extraction, quantification, and analysis in the undergraduate teaching laboratory to engage students in inquiry-based, discovery-driven learning. Detailed instructions for obtaining proteins from animal tissues, using BCA assay to quantify the proteins, and data analysis are provided. The experimental…

  2. The petroleum exponential (again)

    NASA Astrophysics Data System (ADS)

    Bell, Peter M.

    The U.S. production and reserves of liquid and gaseous petroleum have declined since 1960, at least in the lower 48 states. This decline stems from decreased discovery rates, as predicted by M. King Hubbert in the mid-1950's. Hubbert's once unpopular views were based on statistical analysis of the production history of the petroleum industry, and now, even with inclusion of the statistical perturbation caused by the Prudhoe Bay-North Alaskan Slope discovery (the largest oil field ever found in the United States), it seems clear again that production is following the exponential curve to depletion of the resource—to the end of the ultimate yield of petroleum from wells in the United States.In a recent report, C. Hall and C. Cleveland of Cornell University show that large atypical discoveries, such as the Prudhoe Bay find, are but minor influences on what now appears to be the crucial intersection of two exponentials [Science, 211, 576-579, 1981]: the production-per-drilled-foot curve of Hubbert, which crosses zero production no later than the year 2005; the other, a curve that plots the energy cost of drilling and extraction with time; that is, the cost-time rate of how much oil is used to drill and extract oil from the ground. The intersection, if no other discoveries the size of the Prudhoe Bay field are made, could be as early as 1990, the end of the present decade. The inclusion of each Prudhoe-Bay-size find extends the year of intersection by only about 6 years. Beyond that point, more than one barrel of petroleum would be expended for each barrel extracted from the ground. The oil exploration-extraction and refining industry is currently the second most energy-intensive industry in the U.S., and the message seems clear. Either more efficient drilling and production techniques are discovered, or domestic production will cease well before the end of this century if the Hubbert analysis modified by Hall and Cleveland is correct.

  3. Asymmetric threat data mining and knowledge discovery

    NASA Astrophysics Data System (ADS)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Asymmetric threats differ from the conventional force-on- force military encounters that the Defense Department has historically been trained to engage. Terrorism by its nature is now an operational activity that is neither easily detected or countered as its very existence depends on small covert attacks exploiting the element of surprise. But terrorism does have defined forms, motivations, tactics and organizational structure. Exploiting a terrorism taxonomy provides the opportunity to discover and assess knowledge of terrorist operations. This paper describes the Asymmetric Threat Terrorist Assessment, Countering, and Knowledge (ATTACK) system. ATTACK has been developed to (a) data mine open source intelligence (OSINT) information from web-based newspaper sources, video news web casts, and actual terrorist web sites, (b) evaluate this information against a terrorism taxonomy, (c) exploit country/region specific social, economic, political, and religious knowledge, and (d) discover and predict potential terrorist activities and association links. Details of the asymmetric threat structure and the ATTACK system architecture are presented with results of an actual terrorist data mining and knowledge discovery test case shown.

  4. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    PubMed

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V

    2011-01-01

    Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  5. Conceptual dissonance: evaluating the efficacy of natural language processing techniques for validating translational knowledge constructs.

    PubMed

    Payne, Philip R O; Kwok, Alan; Dhaval, Rakesh; Borlawsky, Tara B

    2009-03-01

    The conduct of large-scale translational studies presents significant challenges related to the storage, management and analysis of integrative data sets. Ideally, the application of methodologies such as conceptual knowledge discovery in databases (CKDD) provides a means for moving beyond intuitive hypothesis discovery and testing in such data sets, and towards the high-throughput generation and evaluation of knowledge-anchored relationships between complex bio-molecular and phenotypic variables. However, the induction of such high-throughput hypotheses is non-trivial, and requires correspondingly high-throughput validation methodologies. In this manuscript, we describe an evaluation of the efficacy of a natural language processing-based approach to validating such hypotheses. As part of this evaluation, we will examine a phenomenon that we have labeled as "Conceptual Dissonance" in which conceptual knowledge derived from two or more sources of comparable scope and granularity cannot be readily integrated or compared using conventional methods and automated tools.

  6. Oak Ridge Graph Analytics for Medical Innovation (ORiGAMI)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roberts, Larry W.; Lee, Sangkeun

    2016-01-01

    In this era of data-driven decisions and discovery where Big Data is producing Bigger Data, data scientists at the Oak Ridge National Laboratory are leveraging unique leadership infrastructure (e.g., Urika XA and Urika GD appliances) to develop scalable algorithms for semantic, logical and statistical reasoning with Big Data (i.e., data stored in databases as well as unstructured data in documents). ORiGAMI is a next-generation knowledge-discovery framework that is: (a) knowledge nurturing (i.e., evolves seamlessly with newer knowledge and data), (b) smart and curious (i.e. using information-foraging and reasoning algorithms to digest content) and (c) synergistic (i.e., interfaces computers with whatmore » they do best to help subject-matter-experts do their best. ORiGAMI has been demonstrated using the National Library of Medicine's SEMANTIC MEDLINE (archive of medical knowledge since 1994).« less

  7. Reversal of pentylenetetrazole-altered swimming and neural activity-regulated gene expression in zebrafish larvae by valproic acid and valerian extract.

    PubMed

    Torres-Hernández, Bianca A; Colón, Luis R; Rosa-Falero, Coral; Torrado, Aranza; Miscalichi, Nahira; Ortíz, José G; González-Sepúlveda, Lorena; Pérez-Ríos, Naydi; Suárez-Pérez, Erick; Bradsher, John N; Behra, Martine

    2016-07-01

    Ethnopharmacology has documented hundreds of psychoactive plants awaiting exploitation for drug discovery. A robust and inexpensive in vivo system allowing systematic screening would be critical to exploiting this knowledge. The objective of this study was to establish a cheap and accurate screening method which can be used for testing psychoactive efficacy of complex mixtures of unknown composition, like plant crude extracts. We used automated recording of zebrafish larval swimming behavior during light vs. dark periods which we reproducibly altered with an anxiogenic compound, pentylenetetrazole (PTZ). First, we reversed this PTZ-altered swimming by co-treatment with a well-defined synthetic anxiolytic drug, valproic acid (VPA). Next, we aimed at reversing it by adding crude root extracts of Valeriana officinalis (Val) from which VPA was originally derived. Finally, we assessed how expression of neural activity-regulated genes (c-fos, npas4a, and bdnf) known to be upregulated by PTZ treatment was affected in the presence of Val. Both VPA and Val significantly reversed the PTZ-altered swimming behaviors. Noticeably, Val at higher doses was affecting swimming independently of the presence of PTZ. A strong regulation of all three neural-activity genes was observed in Val-treated larvae which fully supported the behavioral results. We demonstrated in a combined behavioral-molecular approach the strong psychoactivity of a natural extract of unknown composition made from V. officinalis. Our results highlight the efficacy and sensitivity of such an approach, therefore offering a novel in vivo screening system amenable to high-throughput testing of promising ethnobotanical candidates.

  8. The Service Environment for Enhanced Knowledge and Research (SEEKR) Framework

    NASA Astrophysics Data System (ADS)

    King, T. A.; Walker, R. J.; Weigel, R. S.; Narock, T. W.; McGuire, R. E.; Candey, R. M.

    2011-12-01

    The Service Environment for Enhanced Knowledge and Research (SEEKR) Framework is a configurable service oriented framework to enable the discovery, access and analysis of data shared in a community. The SEEKR framework integrates many existing independent services through the use of web technologies and standard metadata. Services are hosted on systems by using an application server and are callable by using REpresentational State Transfer (REST) protocols. Messages and metadata are transferred with eXtensible Markup Language (XML) encoding which conform to a published XML schema. Space Physics Archive Search and Extract (SPASE) metadata is central to utilizing the services. Resources (data, documents, software, etc.) are described with SPASE and the associated Resource Identifier is used to access and exchange resources. The configurable options for the service can be set by using a web interface. Services are packaged as web application resource (WAR) files for direct deployment on application services such as Tomcat or Jetty. We discuss the composition of the SEEKR framework, how new services can be integrated and the steps necessary to deploying the framework. The SEEKR Framework emerged from NASA's Virtual Magnetospheric Observatory (VMO) and other systems and we present an overview of these systems from a SEEKR Framework perspective.

  9. Nanoinformatics knowledge infrastructures: bringing efficient information management to nanomedical research.

    PubMed

    de la Iglesia, D; Cachau, R E; García-Remesal, M; Maojo, V

    2013-11-27

    Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.

  10. EmbryoMiner: A new framework for interactive knowledge discovery in large-scale cell tracking data of developing embryos.

    PubMed

    Schott, Benjamin; Traub, Manuel; Schlagenhauf, Cornelia; Takamiya, Masanari; Antritter, Thomas; Bartschat, Andreas; Löffler, Katharina; Blessing, Denis; Otte, Jens C; Kobitski, Andrei Y; Nienhaus, G Ulrich; Strähle, Uwe; Mikut, Ralf; Stegmaier, Johannes

    2018-04-01

    State-of-the-art light-sheet and confocal microscopes allow recording of entire embryos in 3D and over time (3D+t) for many hours. Fluorescently labeled structures can be segmented and tracked automatically in these terabyte-scale 3D+t images, resulting in thousands of cell migration trajectories that provide detailed insights to large-scale tissue reorganization at the cellular level. Here we present EmbryoMiner, a new interactive open-source framework suitable for in-depth analyses and comparisons of entire embryos, including an extensive set of trajectory features. Starting at the whole-embryo level, the framework can be used to iteratively focus on a region of interest within the embryo, to investigate and test specific trajectory-based hypotheses and to extract quantitative features from the isolated trajectories. Thus, the new framework provides a valuable new way to quantitatively compare corresponding anatomical regions in different embryos that were manually selected based on biological prior knowledge. As a proof of concept, we analyzed 3D+t light-sheet microscopy images of zebrafish embryos, showcasing potential user applications that can be performed using the new framework.

  11. Data mining for blood glucose prediction and knowledge discovery in diabetic patients: the METABO diabetes modeling and management system.

    PubMed

    Georga, Eleni; Protopappas, Vasilios; Guillen, Alejandra; Fico, Giuseppe; Ardigo, Diego; Arredondo, Maria Teresa; Exarchos, Themis P; Polyzos, Demosthenes; Fotiadis, Dimitrios I

    2009-01-01

    METABO is a diabetes monitoring and management system which aims at recording and interpreting patient's context, as well as, at providing decision support to both the patient and the doctor. The METABO system consists of (a) a Patient's Mobile Device (PMD), (b) different types of unobtrusive biosensors, (c) a Central Subsystem (CS) located remotely at the hospital and (d) the Control Panel (CP) from which physicians can follow-up their patients and gain also access to the CS. METABO provides a multi-parametric monitoring system which facilitates the efficient and systematic recording of dietary, physical activity, medication and medical information (continuous and discontinuous glucose measurements). Based on all recorded contextual information, data mining schemes that run in the PMD are responsible to model patients' metabolism, predict hypo/hyper-glycaemic events, and provide the patient with short and long-term alerts. In addition, all past and recently-recorded data are analyzed to extract patterns of behavior, discover new knowledge and provide explanations to the physician through the CP. Advanced tools in the CP allow the physician to prescribe personalized treatment plans and frequently quantify patient's adherence to treatment.

  12. Nanoinformatics knowledge infrastructures: bringing efficient information management to nanomedical research

    NASA Astrophysics Data System (ADS)

    de la Iglesia, D.; Cachau, R. E.; García-Remesal, M.; Maojo, V.

    2013-01-01

    Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.

  13. Knowledge representation and management: transforming textual information into useful knowledge.

    PubMed

    Rassinoux, A-M

    2010-01-01

    To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.

  14. Knowledge discovery from data and Monte-Carlo DEA to evaluate technical efficiency of mental health care in small health areas

    PubMed Central

    García-Alonso, Carlos; Pérez-Naranjo, Leonor

    2009-01-01

    Introduction Knowledge management, based on information transfer between experts and analysts, is crucial for the validity and usability of data envelopment analysis (DEA). Aim To design and develop a methodology: i) to assess technical efficiency of small health areas (SHA) in an uncertainty environment, and ii) to transfer information between experts and operational models, in both directions, for improving expert’s knowledge. Method A procedure derived from knowledge discovery from data (KDD) is used to select, interpret and weigh DEA inputs and outputs. Based on KDD results, an expert-driven Monte-Carlo DEA model has been designed to assess the technical efficiency of SHA in Andalusia. Results In terms of probability, SHA 29 is the most efficient being, on the contrary, SHA 22 very inefficient. 73% of analysed SHA have a probability of being efficient (Pe) >0.9 and 18% <0.5. Conclusions Expert knowledge is necessary to design and validate any operational model. KDD techniques make the transfer of information from experts to any operational model easy and results obtained from the latter improve expert’s knowledge.

  15. 18 CFR 385.403 - Methods of discovery; general provisions (Rule 403).

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Methods of discovery; general provisions (Rule 403). 385.403 Section 385.403 Conservation of Power and Water Resources FEDERAL... the response is true and accurate to the best of that person's knowledge, information, and belief...

  16. Evaluation Techniques for the Sandy Point Discovery Center, Great Bay National Estuarine Research Reserve.

    ERIC Educational Resources Information Center

    Heffernan, Bernadette M.

    1998-01-01

    Describes work done to provide staff of the Sandy Point Discovery Center with methods for evaluating exhibits and interpretive programming. Quantitative and qualitative evaluation measures were designed to assess the program's objective of estuary education. Pretest-posttest questionnaires and interviews are used to measure subjects' knowledge and…

  17. The Prehistory of Discovery: Precursors of Representational Change in Solving Gear System Problems.

    ERIC Educational Resources Information Center

    Dixon, James A.; Bangert, Ashley S.

    2002-01-01

    This study investigated whether the process of representational change undergoes developmental change or different processes occupy different niches in the course of knowledge acquisition. Subjects--college, third-, and sixth-grade students--solved gear system problems over two sessions. Findings indicated that for all grades, discovery of the…

  18. 40 CFR 300.300 - Phase I-Discovery or notification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 27 2010-07-01 2010-07-01 false Phase I-Discovery or notification. 300.300 Section 300.300 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) SUPERFUND... person in charge of a vessel or a facility shall, as soon as he or she has knowledge of any discharge...

  19. Background Knowledge in Learning-Based Relation Extraction

    ERIC Educational Resources Information Center

    Do, Quang Xuan

    2012-01-01

    In this thesis, we study the importance of background knowledge in relation extraction systems. We not only demonstrate the benefits of leveraging background knowledge to improve the systems' performance but also propose a principled framework that allows one to effectively incorporate knowledge into statistical machine learning models for…

  20. An Automated High-Throughput System to Fractionate Plant Natural Products for Drug Discovery

    PubMed Central

    Tu, Ying; Jeffries, Cynthia; Ruan, Hong; Nelson, Cynthia; Smithson, David; Shelat, Anang A.; Brown, Kristin M.; Li, Xing-Cong; Hester, John P.; Smillie, Troy; Khan, Ikhlas A.; Walker, Larry; Guy, Kip; Yan, Bing

    2010-01-01

    The development of an automated, high-throughput fractionation procedure to prepare and analyze natural product libraries for drug discovery screening is described. Natural products obtained from plant materials worldwide were extracted and first prefractionated on polyamide solid-phase extraction cartridges to remove polyphenols, followed by high-throughput automated fractionation, drying, weighing, and reformatting for screening and storage. The analysis of fractions with UPLC coupled with MS, PDA and ELSD detectors provides information that facilitates characterization of compounds in active fractions. Screening of a portion of fractions yielded multiple assay-specific hits in several high-throughput cellular screening assays. This procedure modernizes the traditional natural product fractionation paradigm by seamlessly integrating automation, informatics, and multimodal analytical interrogation capabilities. PMID:20232897

  1. Serendipity: Accidental Discoveries in Science

    NASA Astrophysics Data System (ADS)

    Roberts, Royston M.

    1989-06-01

    Many of the things discovered by accident are important in our everyday lives: Teflon, Velcro, nylon, x-rays, penicillin, safety glass, sugar substitutes, and polyethylene and other plastics. And we owe a debt to accident for some of our deepest scientific knowledge, including Newton's theory of gravitation, the Big Bang theory of Creation, and the discovery of DNA. Even the Rosetta Stone, the Dead Sea Scrolls, and the ruins of Pompeii came to light through chance. This book tells the fascinating stories of these and other discoveries and reveals how the inquisitive human mind turns accident into discovery. Written for the layman, yet scientifically accurate, this illuminating collection of anecdotes portrays invention and discovery as quintessentially human acts, due in part to curiosity, perserverance, and luck.

  2. Closed-Loop Multitarget Optimization for Discovery of New Emulsion Polymerization Recipes

    PubMed Central

    2015-01-01

    Self-optimization of chemical reactions enables faster optimization of reaction conditions or discovery of molecules with required target properties. The technology of self-optimization has been expanded to discovery of new process recipes for manufacture of complex functional products. A new machine-learning algorithm, specifically designed for multiobjective target optimization with an explicit aim to minimize the number of “expensive” experiments, guides the discovery process. This “black-box” approach assumes no a priori knowledge of chemical system and hence particularly suited to rapid development of processes to manufacture specialist low-volume, high-value products. The approach was demonstrated in discovery of process recipes for a semibatch emulsion copolymerization, targeting a specific particle size and full conversion. PMID:26435638

  3. From the EBM pyramid to the Greek temple: a new conceptual approach to Guidelines as implementation tools in mental health.

    PubMed

    Salvador-Carulla, L; Lukersmith, S; Sullivan, W

    2017-04-01

    Guideline methods to develop recommendations dedicate most effort around organising discovery and corroboration knowledge following the evidence-based medicine (EBM) framework. Guidelines typically use a single dimension of information, and generally discard contextual evidence and formal expert knowledge and consumer's experiences in the process. In recognition of the limitations of guidelines in complex cases, complex interventions and systems research, there has been significant effort to develop new tools, guides, resources and structures to use alongside EBM methods of guideline development. In addition to these advances, a new framework based on the philosophy of science is required. Guidelines should be defined as implementation decision support tools for improving the decision-making process in real-world practice and not only as a procedure to optimise the knowledge base of scientific discovery and corroboration. A shift from the model of the EBM pyramid of corroboration of evidence to the use of broader multi-domain perspective graphically depicted as 'Greek temple' could be considered. This model takes into account the different stages of scientific knowledge (discovery, corroboration and implementation), the sources of knowledge relevant to guideline development (experimental, observational, contextual, expert-based and experiential); their underlying inference mechanisms (deduction, induction, abduction, means-end inferences) and a more precise definition of evidence and related terms. The applicability of this broader approach is presented for the development of the Canadian Consensus Guidelines for the Primary Care of People with Developmental Disabilities.

  4. Applying data mining techniques to medical time series: an empirical case study in electroencephalography and stabilometry.

    PubMed

    Anguera, A; Barreiro, J M; Lara, J A; Lizcano, D

    2016-01-01

    One of the major challenges in the medical domain today is how to exploit the huge amount of data that this field generates. To do this, approaches are required that are capable of discovering knowledge that is useful for decision making in the medical field. Time series are data types that are common in the medical domain and require specialized analysis techniques and tools, especially if the information of interest to specialists is concentrated within particular time series regions, known as events. This research followed the steps specified by the so-called knowledge discovery in databases (KDD) process to discover knowledge from medical time series derived from stabilometric (396 series) and electroencephalographic (200) patient electronic health records (EHR). The view offered in the paper is based on the experience gathered as part of the VIIP project. Knowledge discovery in medical time series has a number of difficulties and implications that are highlighted by illustrating the application of several techniques that cover the entire KDD process through two case studies. This paper illustrates the application of different knowledge discovery techniques for the purposes of classification within the above domains. The accuracy of this application for the two classes considered in each case is 99.86% and 98.11% for epilepsy diagnosis in the electroencephalography (EEG) domain and 99.4% and 99.1% for early-age sports talent classification in the stabilometry domain. The KDD techniques achieve better results than other traditional neural network-based classification techniques.

  5. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

    PubMed Central

    2017-01-01

    Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863

  6. Integrated Genomic Biomarkers to Identify Aggressive Disease in African Americans with Prostate Cancer

    DTIC Science & Technology

    2016-09-01

    300 of these men; have completed pathology review of 70 of the discovery sample tumors; macrodissected and performed DNA extraction from 50 tumors...block, and sections cut and tumor areas marked by histopathologist. Target completion September 1st 2017; Discovery sample 35% completed Pathology ...African American population. Target completion March 2017; 50% completed. What was accomplished under these goals? In the current reporting

  7. Equation Discovery for Model Identification in Respiratory Mechanics of the Mechanically Ventilated Human Lung

    NASA Astrophysics Data System (ADS)

    Ganzert, Steven; Guttmann, Josef; Steinmann, Daniel; Kramer, Stefan

    Lung protective ventilation strategies reduce the risk of ventilator associated lung injury. To develop such strategies, knowledge about mechanical properties of the mechanically ventilated human lung is essential. This study was designed to develop an equation discovery system to identify mathematical models of the respiratory system in time-series data obtained from mechanically ventilated patients. Two techniques were combined: (i) the usage of declarative bias to reduce search space complexity and inherently providing the processing of background knowledge. (ii) A newly developed heuristic for traversing the hypothesis space with a greedy, randomized strategy analogical to the GSAT algorithm. In 96.8% of all runs the applied equation discovery system was capable to detect the well-established equation of motion model of the respiratory system in the provided data. We see the potential of this semi-automatic approach to detect more complex mathematical descriptions of the respiratory system from respiratory data.

  8. Antisense oligonucleotide technologies in drug discovery.

    PubMed

    Aboul-Fadl, Tarek

    2006-09-01

    The principle of antisense oligonucleotide (AS-OD) technologies is based on the specific inhibition of unwanted gene expression by blocking mRNA activity. It has long appeared to be an ideal strategy to leverage new genomic knowledge for drug discovery and development. In recent years, AS-OD technologies have been widely used as potent and promising tools for this purpose. There is a rapid increase in the number of antisense molecules progressing in clinical trials. AS-OD technologies provide a simple and efficient approach for drug discovery and development and are expected to become a reality in the near future. This editorial describes the established and emerging AS-OD technologies in drug discovery.

  9. 100 years of elementary particles [Beam Line, vol. 27, issue 1, Spring 1997

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pais, Abraham; Weinberg, Steven; Quigg, Chris

    1997-04-01

    This issue of Beam Line commemorates the 100th anniversary of the April 30, 1897 report of the discovery of the electron by J.J. Thomson and the ensuing discovery of other subatomic particles. In the first three articles, theorists Abraham Pais, Steven Weinberg, and Chris Quigg provide their perspectives on the discoveries of elementary particles as well as the implications and future directions resulting from these discoveries. In the following three articles, Michael Riordan, Wolfgang Panofsky, and Virginia Trimble apply our knowledge about elementary particles to high-energy research, electronics technology, and understanding the origin and evolution of our Universe.

  10. 100 years of Elementary Particles [Beam Line, vol. 27, issue 1, Spring 1997

    DOE R&D Accomplishments Database

    Pais, Abraham; Weinberg, Steven; Quigg, Chris; Riordan, Michael; Panofsky, Wolfgang K. H.; Trimble, Virginia

    1997-04-01

    This issue of Beam Line commemorates the 100th anniversary of the April 30, 1897 report of the discovery of the electron by J.J. Thomson and the ensuing discovery of other subatomic particles. In the first three articles, theorists Abraham Pais, Steven Weinberg, and Chris Quigg provide their perspectives on the discoveries of elementary particles as well as the implications and future directions resulting from these discoveries. In the following three articles, Michael Riordan, Wolfgang Panofsky, and Virginia Trimble apply our knowledge about elementary particles to high-energy research, electronics technology, and understanding the origin and evolution of our Universe.

  11. A network model of knowledge accumulation through diffusion and upgrade

    NASA Astrophysics Data System (ADS)

    Zhuang, Enyu; Chen, Guanrong; Feng, Gang

    2011-07-01

    In this paper, we introduce a model to describe knowledge accumulation through knowledge diffusion and knowledge upgrade in a multi-agent network. Here, knowledge diffusion refers to the distribution of existing knowledge in the network, while knowledge upgrade means the discovery of new knowledge. It is found that the population of the network and the number of each agent’s neighbors affect the speed of knowledge accumulation. Four different policies for updating the neighboring agents are thus proposed, and their influence on the speed of knowledge accumulation and the topology evolution of the network are also studied.

  12. Interfaith Education: An Islamic Perspective

    ERIC Educational Resources Information Center

    Pallavicini, Yahya Sergio Yahe

    2016-01-01

    According to a teaching of the Prophet Muhammad, "the quest for knowledge is the duty of each Muslim, male or female", where knowledge is meant as the discovery of the real value of things and of oneself in relationship with the world in which God has placed us. This universal dimension of knowledge is in fact a wealth of wisdom of the…

  13. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples.

    PubMed

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-07-05

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.

  14. Integrating semantic web technologies and geospatial catalog services for geospatial information discovery and processing in cyberinfrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yue, Peng; Gong, Jianya; Di, Liping

    Abstract A geospatial catalogue service provides a network-based meta-information repository and interface for advertising and discovering shared geospatial data and services. Descriptive information (i.e., metadata) for geospatial data and services is structured and organized in catalogue services. The approaches currently available for searching and using that information are often inadequate. Semantic Web technologies show promise for better discovery methods by exploiting the underlying semantics. Such development needs special attention from the Cyberinfrastructure perspective, so that the traditional focus on discovery of and access to geospatial data can be expanded to support the increased demand for processing of geospatial information andmore » discovery of knowledge. Semantic descriptions for geospatial data, services, and geoprocessing service chains are structured, organized, and registered through extending elements in the ebXML Registry Information Model (ebRIM) of a geospatial catalogue service, which follows the interface specifications of the Open Geospatial Consortium (OGC) Catalogue Services for the Web (CSW). The process models for geoprocessing service chains, as a type of geospatial knowledge, are captured, registered, and discoverable. Semantics-enhanced discovery for geospatial data, services/service chains, and process models is described. Semantic search middleware that can support virtual data product materialization is developed for the geospatial catalogue service. The creation of such a semantics-enhanced geospatial catalogue service is important in meeting the demands for geospatial information discovery and analysis in Cyberinfrastructure.« less

  15. Comparative study of the antimicrobial activity of native and exotic plants from the Caatinga and Atlantic Forest selected through an ethnobotanical survey.

    PubMed

    Castelo Branco Rangel de Almeida, Cecília de Fátima; de Vasconcelos Cabral, Daniela Lyra; Rangel de Almeida, Camila Castelo Branco; Cavalcanti de Amorim, Elba Lúcia; de Araújo, Janete Magali; de Albuquerque, Ulysses Paulino

    2012-02-01

    The idea that many commonly used medicinal plants may lead to the discovery of new drugs has encouraged the study of local knowledge of these resources. An ethnobotanical survey of species traditionally used for the treatment of infectious diseases was undertaken in two areas of northeastern Brazil: one in the Caatinga (dry forest) and another in the Atlantic Forest (humid forest). Initially, diffusion tests using paper disks and subsequently, for extracts presenting significant results (inhibition halos above 15 mm), minimum inhibitory concentrations were determined. The activity was evaluated as a percentage for each species, comparing the diameters of the inhibition halos and the number of positive results against the seven microorganisms studied. Extracts were classified into three categories: strong activity-species with halos exceeding 16 mm, moderate activity-species with halos between 13 mm and 15 mm and low activity-species with halos below 12 mm. We selected 34 species, 20 from the Caatinga and 14 from the Atlantic Forest. In the Caatinga, 50% of the 20 plant extracts studied had strong antimicrobial activity, 25% had moderate activity and 15% had low activity. In the Atlantic Forest, 28.5% of the 14 plant extracts studied showed strong activity, with 14.5% having moderate activity and 28.5% having low activity. The microorganism that was most susceptible to the extracts from the Caatinga, was Mycobacterium smegmatis; 85% of the species tested were able to inhibit its growth. The organism that was susceptible to the highest number of plant species (71%) from the Atlantic Forest was Staphylococcus aureus. Extracts from the Caatinga showed a trend of superior antimicrobial activity compared to the species from the Atlantic Forest, in terms of both inhibiting a greater variety of microorganisms and demonstrating higher activity against susceptible strains.

  16. 77 FR 75459 - Self-Regulatory Organizations; BATS Exchange, Inc.; Notice of Filing of a Proposed Rule Change To...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-12-20

    ... both sides would participate in an Exchange Auction, this proposed change would aid in price discovery... auction price. This proposed change would aid in price discovery and help to reduce the likelihood of... Sell Shares and, therefore, a User would never have complete knowledge of liquidity available on both...

  17. Essential Skills and Knowledge for Troubleshooting E-Resources Access Issues in a Web-Scale Discovery Environment

    ERIC Educational Resources Information Center

    Carter, Sunshine; Traill, Stacie

    2017-01-01

    Electronic resource access troubleshooting is familiar work in most libraries. The added complexity introduced when a library implements a web-scale discovery service, however, creates a strong need for well-organized, rigorous training to enable troubleshooting staff to provide the best service possible. This article outlines strategies, tools,…

  18. Revealing Significant Relations between Chemical/Biological Features and Activity: Associative Classification Mining for Drug Discovery

    ERIC Educational Resources Information Center

    Yu, Pulan

    2012-01-01

    Classification, clustering and association mining are major tasks of data mining and have been widely used for knowledge discovery. Associative classification mining, the combination of both association rule mining and classification, has emerged as an indispensable way to support decision making and scientific research. In particular, it offers a…

  19. Mothers' Initial Discovery of Childhood Disability: Exploring Maternal Identification of Developmental Issues in Young Children

    ERIC Educational Resources Information Center

    Silbersack, Elionora W.

    2014-01-01

    The purpose of this qualitative study was to expand the scarce information available on how mothers first observe their children's early development, assess potential problems, and then come to recognize their concerns. In-depth knowledge about mothers' perspectives on the discovery process can help social workers to promote identification of…

  20. Augmented Reality-Based Simulators as Discovery Learning Tools: An Empirical Study

    ERIC Educational Resources Information Center

    Ibáñez, María-Blanca; Di-Serio, Ángela; Villarán-Molina, Diego; Delgado-Kloos, Carlos

    2015-01-01

    This paper reports empirical evidence on having students use AR-SaBEr, a simulation tool based on augmented reality (AR), to discover the basic principles of electricity through a series of experiments. AR-SaBEr was enhanced with knowledge-based support and inquiry-based scaffolding mechanisms, which proved useful for discovery learning in…

  1. 76 FR 36320 - Rules of Practice in Proceedings Relative to False Representation and Lottery Orders

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-22

    ... officers. 952.18 Evidence. 952.19 Subpoenas. 952.20 Witness fees. 952.21 Discovery. 952.22 Transcript. 952..., motions, proposed orders, and other documents for the record. Discovery need not be filed except as may be... witnesses, that the statement correctly states the witness's opinion or knowledge concerning the matters in...

  2. Making the Long Tail Visible: Social Networking Sites and Independent Music Discovery

    ERIC Educational Resources Information Center

    Gaffney, Michael; Rafferty, Pauline

    2009-01-01

    Purpose: The purpose of this paper is to investigate users' knowledge and use of social networking sites and folksonomies to discover if social tagging and folksonomies, within the area of independent music, aid in its information retrieval and discovery. The sites examined in this project are MySpace, Lastfm, Pandora and Allmusic. In addition,…

  3. TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

    PubMed

    Cumbo, Fabio; Fiscon, Giulia; Ceri, Stefano; Masseroli, Marco; Weitschek, Emanuel

    2017-01-03

    Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.

  4. Combining knowledge discovery from databases (KDD) and case-based reasoning (CBR) to support diagnosis of medical images

    NASA Astrophysics Data System (ADS)

    Stranieri, Andrew; Yearwood, John; Pham, Binh

    1999-07-01

    The development of data warehouses for the storage and analysis of very large corpora of medical image data represents a significant trend in health care and research. Amongst other benefits, the trend toward warehousing enables the use of techniques for automatically discovering knowledge from large and distributed databases. In this paper, we present an application design for knowledge discovery from databases (KDD) techniques that enhance the performance of the problem solving strategy known as case- based reasoning (CBR) for the diagnosis of radiological images. The problem of diagnosing the abnormality of the cervical spine is used to illustrate the method. The design of a case-based medical image diagnostic support system has three essential characteristics. The first is a case representation that comprises textual descriptions of the image, visual features that are known to be useful for indexing images, and additional visual features to be discovered by data mining many existing images. The second characteristic of the approach presented here involves the development of a case base that comprises an optimal number and distribution of cases. The third characteristic involves the automatic discovery, using KDD techniques, of adaptation knowledge to enhance the performance of the case based reasoner. Together, the three characteristics of our approach can overcome real time efficiency obstacles that otherwise mitigate against the use of CBR to the domain of medical image analysis.

  5. Discovery learning model with geogebra assisted for improvement mathematical visual thinking ability

    NASA Astrophysics Data System (ADS)

    Juandi, D.; Priatna, N.

    2018-05-01

    The main goal of this study is to improve the mathematical visual thinking ability of high school student through implementation the Discovery Learning Model with Geogebra Assisted. This objective can be achieved through study used quasi-experimental method, with non-random pretest-posttest control design. The sample subject of this research consist of 62 senior school student grade XI in one of school in Bandung district. The required data will be collected through documentation, observation, written tests, interviews, daily journals, and student worksheets. The results of this study are: 1) Improvement students Mathematical Visual Thinking Ability who obtain learning with applied the Discovery Learning Model with Geogebra assisted is significantly higher than students who obtain conventional learning; 2) There is a difference in the improvement of students’ Mathematical Visual Thinking ability between groups based on prior knowledge mathematical abilities (high, medium, and low) who obtained the treatment. 3) The Mathematical Visual Thinking Ability improvement of the high group is significantly higher than in the medium and low groups. 4) The quality of improvement ability of high and low prior knowledge is moderate category, in while the quality of improvement ability in the high category achieved by student with medium prior knowledge.

  6. Temporal data mining for the quality assessment of hemodialysis services.

    PubMed

    Bellazzi, Riccardo; Larizza, Cristiana; Magni, Paolo; Bellazzi, Roberto

    2005-05-01

    This paper describes the temporal data mining aspects of a research project that deals with the definition of methods and tools for the assessment of the clinical performance of hemodialysis (HD) services, on the basis of the time series automatically collected during hemodialysis sessions. Intelligent data analysis and temporal data mining techniques are applied to gain insight and to discover knowledge on the causes of unsatisfactory clinical results. In particular, two new methods for association rule discovery and temporal rule discovery are applied to the time series. Such methods exploit several pre-processing techniques, comprising data reduction, multi-scale filtering and temporal abstractions. We have analyzed the data of more than 5800 dialysis sessions coming from 43 different patients monitored for 19 months. The qualitative rules associating the outcome parameters and the measured variables were examined by the domain experts, which were able to distinguish between rules confirming available background knowledge and unexpected but plausible rules. The new methods proposed in the paper are suitable tools for knowledge discovery in clinical time series. Their use in the context of an auditing system for dialysis management helped clinicians to improve their understanding of the patients' behavior.

  7. DrugQuest - a text mining workflow for drug association discovery.

    PubMed

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

    2016-06-06

    Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

  8. Chromatogram-Bioactivity Correlation-Based Discovery and Identification of Three Bioactive Compounds Affecting Endothelial Function in Ginkgo Biloba Extract.

    PubMed

    Liu, Hong; Tan, Li-Ping; Huang, Xin; Liao, Yi-Qiu; Zhang, Wei-Jian; Li, Pei-Bo; Wang, Yong-Gang; Peng, Wei; Wu, Zhong; Su, Wei-Wei; Yao, Hong-Liang

    2018-05-03

    Discovery and identification of three bioactive compounds affecting endothelial function in Ginkgo biloba Extract (GBE) based on chromatogram-bioactivity correlation analysis. Three portions were separated from GBE via D101 macroporous resin and then re-combined to prepare nine GBE samples. 21 compounds in GBE samples were identified through UFLC-DAD-Q-TOF-MS/MS. Correlation analysis between compounds differences and endothelin-1 (ET-1) in vivo in nine GBE samples was conducted. The analysis results indicated that three bioactive compounds had close relevance to ET-1: Kaempferol-3- O -α-l-glucoside, 3- O -{2- O -{6- O -[P-OH-trans-cinnamoyl]-β-d-glucosyl}-α-rhamnosyl} Quercetin isomers, and 3- O -{2- O -{6- O -[P-OH-trans-cinnamoyl]-β-d-glucosyl}-α-rhamnosyl} Kaempferide. The discovery of bioactive compounds could provide references for the quality control and novel pharmaceuticals development of GRE. The present work proposes a feasible chromatogram-bioactivity correlation based approach to discover the compounds and define their bioactivities for the complex multi-component systems.

  9. Nature Bank and the Queensland Compound Library: unique international resources at the Eskitis Institute for Drug Discovery.

    PubMed

    Camp, David; Newman, Stuart; Pham, Ngoc B; Quinn, Ronald J

    2014-03-01

    The Eskitis Institute for Drug Discovery is home to two unique resources, Nature Bank and the Queensland Compound Library (QCL), that differentiate it from many other academic institutes pursuing chemical biology or early phase drug discovery. Nature Bank is a comprehensive collection of plants and marine invertebrates that have been subjected to a process which aligns downstream extracts and fractions with lead- and drug-like physicochemical properties. Considerable expertise in screening natural product extracts/fractions was developed at Eskitis over the last two decades. Importantly, biodiscovery activities have been conducted from the beginning in accordance with the UN Convention on Biological Diversity (CBD) to ensure compliance with all international and national legislative requirements. The QCL is a compound management and logistics facility that was established from public funds to augment previous investments in high throughput and phenotypic screening in the region. A unique intellectual property (IP) model has been developed in the case of the QCL to stimulate applied, basic and translational research in the chemical and life sciences by industry, non-profit, and academic organizations.

  10. Perspective: Materials Informatics and Big Data: Realization of the Fourth Paradigm of Science in Materials Science

    DTIC Science & Technology

    2016-08-17

    thereby opening up new avenues for accelerated materials discovery and design . The need for such data analytics has also been emphasized by the...and design . The construction of inverse models is typically formulated as an optimiza- tion problem wherein a property or performance metric of...discovery and design . extraction, feature selection, etc. Such data preprocessing can either be supervised or unsupervised, based on whether the

  11. Protein crystallography and drug discovery: recollections of knowledge exchange between academia and industry

    PubMed Central

    2017-01-01

    The development of structure-guided drug discovery is a story of knowledge exchange where new ideas originate from all parts of the research ecosystem. Dorothy Crowfoot Hodgkin obtained insulin from Boots Pure Drug Company in the 1930s and insulin crystallization was optimized in the company Novo in the 1950s, allowing the structure to be determined at Oxford University. The structure of renin was developed in academia, on this occasion in London, in response to a need to develop antihypertensives in pharma. The idea of a dimeric aspartic protease came from an international academic team and was discovered in HIV; it eventually led to new HIV antivirals being developed in industry. Structure-guided fragment-based discovery was developed in large pharma and biotechs, but has been exploited in academia for the development of new inhibitors targeting protein–protein interactions and also antimicrobials to combat mycobacterial infections such as tuberculosis. These observations provide a strong argument against the so-called ‘linear model’, where ideas flow only in one direction from academic institutions to industry. Structure-guided drug discovery is a story of applications of protein crystallography and knowledge exhange between academia and industry that has led to new drug approvals for cancer and other common medical conditions by the Food and Drug Administration in the USA, as well as hope for the treatment of rare genetic diseases and infectious diseases that are a particular challenge in the developing world. PMID:28875019

  12. Choosing experiments to accelerate collective discovery

    PubMed Central

    Rzhetsky, Andrey; Foster, Jacob G.; Foster, Ian T.

    2015-01-01

    A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies. PMID:26554009

  13. Knowledge Discovery, Integration and Communication for Extreme Weather and Flood Resilience Using Artificial Intelligence: Flood AI Alpha

    NASA Astrophysics Data System (ADS)

    Demir, I.; Sermet, M. Y.

    2016-12-01

    Nobody is immune from extreme events or natural hazards that can lead to large-scale consequences for the nation and public. One of the solutions to reduce the impacts of extreme events is to invest in improving resilience with the ability to better prepare, plan, recover, and adapt to disasters. The National Research Council (NRC) report discusses the topic of how to increase resilience to extreme events through a vision of resilient nation in the year 2030. The report highlights the importance of data, information, gaps and knowledge challenges that needs to be addressed, and suggests every individual to access the risk and vulnerability information to make their communities more resilient. This abstracts presents our project on developing a resilience framework for flooding to improve societal preparedness with objectives; (a) develop a generalized ontology for extreme events with primary focus on flooding; (b) develop a knowledge engine with voice recognition, artificial intelligence, natural language processing, and inference engine. The knowledge engine will utilize the flood ontology and concepts to connect user input to relevant knowledge discovery outputs on flooding; (c) develop a data acquisition and processing framework from existing environmental observations, forecast models, and social networks. The system will utilize the framework, capabilities and user base of the Iowa Flood Information System (IFIS) to populate and test the system; (d) develop a communication framework to support user interaction and delivery of information to users. The interaction and delivery channels will include voice and text input via web-based system (e.g. IFIS), agent-based bots (e.g. Microsoft Skype, Facebook Messenger), smartphone and augmented reality applications (e.g. smart assistant), and automated web workflows (e.g. IFTTT, CloudWork) to open the knowledge discovery for flooding to thousands of community extensible web workflows.

  14. Cytotoxic, Virucidal, and Antiviral Activity of South American Plant and Algae Extracts

    PubMed Central

    Faral-Tello, Paula; Mirazo, Santiago; Dutra, Carmelo; Pérez, Andrés; Geis-Asteggiante, Lucía; Frabasile, Sandra; Koncke, Elina; Davyt, Danilo; Cavallaro, Lucía; Heinzen, Horacio; Arbiza, Juan

    2012-01-01

    Herpes simplex virus type 1 (HSV-1) infection has a prevalence of 70% in the human population. Treatment is based on acyclovir, valacyclovir, and foscarnet, three drugs that share the same mechanism of action and of which resistant strains have been isolated from patients. In this aspect, innovative drug therapies are required. Natural products offer unlimited opportunities for the discovery of antiviral compounds. In this study, 28 extracts corresponding to 24 plant species and 4 alga species were assayed in vitro to detect antiviral activity against HSV-1. Six of the methanolic extracts inactivated viral particles by direct interaction and 14 presented antiviral activity when incubated with cells already infected. Most interesting antiviral activity values obtained are those of Limonium brasiliense, Psidium guajava, and Phyllanthus niruri, which inhibit HSV-1 replication in vitro with 50% effective concentration (EC50) values of 185, 118, and 60 μg/mL, respectively. For these extracts toxicity values were calculated and therefore selectivity indexes (SI) obtained. Further characterization of the bioactive components of antiviral plants will pave the way for the discovery of new compounds against HSV-1. PMID:22619617

  15. Applying knowledge-anchored hypothesis discovery methods to advance clinical and translational research: the OAMiner project

    PubMed Central

    Jackson, Rebecca D; Best, Thomas M; Borlawsky, Tara B; Lai, Albert M; James, Stephen; Gurcan, Metin N

    2012-01-01

    The conduct of clinical and translational research regularly involves the use of a variety of heterogeneous and large-scale data resources. Scalable methods for the integrative analysis of such resources, particularly when attempting to leverage computable domain knowledge in order to generate actionable hypotheses in a high-throughput manner, remain an open area of research. In this report, we describe both a generalizable design pattern for such integrative knowledge-anchored hypothesis discovery operations and our experience in applying that design pattern in the experimental context of a set of driving research questions related to the publicly available Osteoarthritis Initiative data repository. We believe that this ‘test bed’ project and the lessons learned during its execution are both generalizable and representative of common clinical and translational research paradigms. PMID:22647689

  16. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery

    PubMed Central

    Huo, Zhiguang; Tseng, George

    2017-01-01

    Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is-K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency. PMID:28959370

  17. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

    PubMed

    Huo, Zhiguang; Tseng, George

    2017-06-01

    Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K -means (is- K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is- K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.

  18. How can knowledge discovery methods uncover spatio-temporal patterns in environmental data?

    NASA Astrophysics Data System (ADS)

    Wachowicz, Monica

    2000-04-01

    This paper proposes the integration of KDD, GVis and STDB as a long-term strategy, which will allow users to apply knowledge discovery methods for uncovering spatio-temporal patterns in environmental data. The main goal is to combine innovative techniques and associated tools for exploring very large environmental data sets in order to arrive at valid, novel, potentially useful, and ultimately understandable spatio-temporal patterns. The GeoInsight approach is described using the principles and key developments in the research domains of KDD, GVis, and STDB. The GeoInsight approach aims at the integration of these research domains in order to provide tools for performing information retrieval, exploration, analysis, and visualization. The result is a knowledge-based design, which involves visual thinking (perceptual-cognitive process) and automated information processing (computer-analytical process).

  19. A Semantic Lexicon-Based Approach for Sense Disambiguation and Its WWW Application

    NASA Astrophysics Data System (ADS)

    di Lecce, Vincenzo; Calabrese, Marco; Soldo, Domenico

    This work proposes a basic framework for resolving sense disambiguation through the use of Semantic Lexicon, a machine readable dictionary managing both word senses and lexico-semantic relations. More specifically, polysemous ambiguity characterizing Web documents is discussed. The adopted Semantic Lexicon is WordNet, a lexical knowledge-base of English words widely adopted in many research studies referring to knowledge discovery. The proposed approach extends recent works on knowledge discovery by focusing on the sense disambiguation aspect. By exploiting the structure of WordNet database, lexico-semantic features are used to resolve the inherent sense ambiguity of written text with particular reference to HTML resources. The obtained results may be extended to generic hypertextual repositories as well. Experiments show that polysemy reduction can be used to hint about the meaning of specific senses in given contexts.

  20. State of the Art in Tumor Antigen and Biomarker Discovery

    PubMed Central

    Even-Desrumeaux, Klervi; Baty, Daniel; Chames, Patrick

    2011-01-01

    Our knowledge of tumor immunology has resulted in multiple approaches for the treatment of cancer. However, a gap between research of new tumors markers and development of immunotherapy has been established and very few markers exist that can be used for treatment. The challenge is now to discover new targets for active and passive immunotherapy. This review aims at describing recent advances in biomarkers and tumor antigen discovery in terms of antigen nature and localization, and is highlighting the most recent approaches used for their discovery including “omics” technology. PMID:24212823

  1. Discovering and Articulating What Is Not yet Known: Using Action Learning and Grounded Theory as a Knowledge Management Strategy

    ERIC Educational Resources Information Center

    Pauleen, David J.; Corbitt, Brian; Yoong, Pak

    2007-01-01

    Purpose: To provide a conceptual model for the discovery and articulation of emergent organizational knowledge, particularly knowledge that develops when people work with new technologies. Design/methodology/approach: The model is based on two widely accepted research methods--action learning and grounded theory--and is illustrated using a case…

  2. Cache-Cache Comparison for Supporting Meaningful Learning

    ERIC Educational Resources Information Center

    Wang, Jingyun; Fujino, Seiji

    2015-01-01

    The paper presents a meaningful discovery learning environment called "cache-cache comparison" for a personalized learning support system. The processing of seeking hidden relations or concepts in "cache-cache comparison" is intended to encourage learners to actively locate new knowledge in their knowledge framework and check…

  3. From Wisdom to Innocence: Passing on the Knowledge of the Night Sky

    NASA Technical Reports Server (NTRS)

    Shope, R.

    1996-01-01

    Memorable learning can happen when the whole family shares the thrill of discovery together. The fascination of the night sky presents a perfect opportunity for gifted parents and children to experience the tradition of passing on knowledge from generation to generation.

  4. Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis

    DTIC Science & Technology

    1989-08-01

    Automatic Line Network Extraction from Aerial Imangery of Urban Areas Sthrough KnowledghBased Image Analysis N 04 Final Technical ReportI December...Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis Accesion For NTIS CRA&I DTIC TAB 0...paittern re’ognlition. blac’kboardl oriented symbollic processing, knowledge based image analysis , image understanding, aer’ial imsagery, urban area, 17

  5. A framework for interval-valued information system

    NASA Astrophysics Data System (ADS)

    Yin, Yunfei; Gong, Guanghong; Han, Liang

    2012-09-01

    Interval-valued information system is used to transform the conventional dataset into the interval-valued form. To conduct the interval-valued data mining, we conduct two investigations: (1) construct the interval-valued information system, and (2) conduct the interval-valued knowledge discovery. In constructing the interval-valued information system, we first make the paired attributes in the database discovered, and then, make them stored in the neighbour locations in a common database and regard them as 'one' new field. In conducting the interval-valued knowledge discovery, we utilise some related priori knowledge and regard the priori knowledge as the control objectives; and design an approximate closed-loop control mining system. On the implemented experimental platform (prototype), we conduct the corresponding experiments and compare the proposed algorithms with several typical algorithms, such as the Apriori algorithm, the FP-growth algorithm and the CLOSE+ algorithm. The experimental results show that the interval-valued information system method is more effective than the conventional algorithms in discovering interval-valued patterns.

  6. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

    NASA Astrophysics Data System (ADS)

    Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr

    2017-08-01

    This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

  7. Learning and Relevance in Information Retrieval: A Study in the Application of Exploration and User Knowledge to Enhance Performance

    ERIC Educational Resources Information Center

    Hyman, Harvey

    2012-01-01

    This dissertation examines the impact of exploration and learning upon eDiscovery information retrieval; it is written in three parts. Part I contains foundational concepts and background on the topics of information retrieval and eDiscovery. This part informs the reader about the research frameworks, methodologies, data collection, and…

  8. Using Discovery Maps as a Free-Choice Learning Process Can Enhance the Effectiveness of Environmental Education in a Botanical Garden

    ERIC Educational Resources Information Center

    Yang, Xi; Chen, Jin

    2017-01-01

    Botanical gardens (BGs) are important agencies that enhance human knowledge and attitude towards flora conservation. By following free-choice learning model, we developed a "Discovery map" and distributed the map to visitors at the Xishuangbanna Tropical Botanical Garden in Yunnan, China. Visitors, who did and did not receive discovery…

  9. Discovery and Observations of a Stem-Boring Weevil (Myrmex sp.) a Potentially Useful Biocontrol of Mistletoe

    Treesearch

    J. D. Solomon; L. Newsome; T. H. Filer

    1984-01-01

    A stem-boring weevil obtained from infested clusters of mistletoe was subsequently reared and identified as Myrmex sp. To our knowledge its discovery in Mississippi is the easternmost record of mistletoe-feeding Myrmex, previously recorded only from the West and Southwest. Based on current studies, the weevil overwinters as larvae in tunnels within mistletoe stems....

  10. The importance of Leonhard Euler's discoveries in the field of shipbuilding for the scientific evolution of academician A. N. Krylov

    NASA Astrophysics Data System (ADS)

    Sharkov, N. A.; Sharkova, O. A.

    2018-05-01

    The paper identifies the importance of the Leonhard Euler's discoveries in the field of shipbuilding for the scientific evolution of academician A. N. Krylov and for the modern knowledge in survivability and safety of ships. The works by Leonard Euler "Marine Science" and "The Moon Motion New Theory" are discussed.

  11. Cost-Benefit Analysis of Confidentiality Policies for Advanced Knowledge Management Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    May, D

    Knowledge Discovery (KD) processes can create new information within a Knowledge Management (KM) system. In many domains, including government, this new information must be secured against unauthorized disclosure. Applying an appropriate confidentiality policy achieves this. However, it is not evident which confidentiality policy to apply, especially when the goals of sharing and disseminating knowledge have to be balanced with the requirements to secure knowledge. This work proposes to solve this problem by developing a cost-benefit analysis technique for examining the tradeoffs between securing and sharing discovered knowledge.

  12. A novel sample preparation and on-line HPLC-DAD-MS/MS-BCD analysis for rapid screening and characterization of specific enzyme inhibitors in herbal extracts: case study of α-glucosidase.

    PubMed

    Li, D Q; Zhao, J; Xie, J; Li, S P

    2014-01-01

    Drug discovery from complex mixture like Chinese herbs is a challenge and extensive false positives make the obtainment of specific bioactive compounds difficult. In the present study, a novel sample preparation method was proposed to rapidly reveal the specific bioactive compounds from complex mixtures using α-glucosidase as a case. Firstly, aqueous and methanol extracts of 500 traditional Chinese medicines were carried out with the aim of finding new sources of α-glucosidase inhibitors. As a result, the extracts of fruit of Terminalia chebula (FTC), flowers of Rosa rugosa (FRR) and Eugenia caryophyllata (FEC) as well as husk of Punica granatum (HPG) showed high inhibition on α-glucosidase. On-line liquid chromatography-diode array detection-tandem mass spectrometry and biochemical detection (HPLC-DAD-MS/MS-BCD) was performed to rapidly screen and characterize α-glucosidase inhibitors in these four extracts. After tentative identification, most of compounds with inhibitory activity in the investigated crude extracts were found to be tannins commonly recognized as non-specific enzyme inhibitors in vitro. Subsequently, the four extracts were treated with gelatin to improve specificity of the on-line system. Finally, two compounds with specific α-glucosidase inhibition were identified as corilagin and ellagic acid. The developed method could discover specific α-glucosidase inhibitors in complex mixtures such as plant extracts, which could also be used for discovery of specific inhibitors of other enzymes. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  14. Discovery of the leinamycin family of natural products by mining actinobacterial genomes

    PubMed Central

    Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen

    2017-01-01

    Nature’s ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF–SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF–SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm-type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature’s rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature’s biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity. PMID:29229819

  15. Discovery of the leinamycin family of natural products by mining actinobacterial genomes.

    PubMed

    Pan, Guohui; Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Yang, Dong; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen; Shen, Ben

    2017-12-26

    Nature's ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF-SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF-SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm -type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature's rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature's biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity.

  16. Modeling & Informatics at Vertex Pharmaceuticals Incorporated: our philosophy for sustained impact

    NASA Astrophysics Data System (ADS)

    McGaughey, Georgia; Patrick Walters, W.

    2017-03-01

    Molecular modelers and informaticians have the unique opportunity to integrate cross-functional data using a myriad of tools, methods and visuals to generate information. Using their drug discovery expertise, information is transformed to knowledge that impacts drug discovery. These insights are often times formulated locally and then applied more broadly, which influence the discovery of new medicines. This is particularly true in an organization where the members are exposed to projects throughout an organization, such as in the case of the global Modeling & Informatics group at Vertex Pharmaceuticals. From its inception, Vertex has been a leader in the development and use of computational methods for drug discovery. In this paper, we describe the Modeling & Informatics group at Vertex and the underlying philosophy, which has driven this team to sustain impact on the discovery of first-in-class transformative medicines.

  17. 36 CFR 292.41 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... disabilities. Mining means any activity related to the discovery, extraction and exploitation of minerals under.... Paleontological resources means any remains, trace, or imprint of a plant or animal that has been preserved in the...

  18. Ensuring the Quality of Outreach: The Critical Role of Evaluating Individual and Collective Initiatives and Performance

    ERIC Educational Resources Information Center

    Lynton, Ernest A.

    2016-01-01

    New knowledge is created in the course of the application of outreach. Each complex problem in the real world is likely to have unique aspects and thus it requires some modification of standard approaches. Hence, each engagement in outreach is likely to have an element of inquiry and discovery, leading to new knowledge. The flow of knowledge is in…

  19. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  20. Concepts of formal concept analysis

    NASA Astrophysics Data System (ADS)

    Žáček, Martin; Homola, Dan; Miarka, Rostislav

    2017-07-01

    The aim of this article is apply of Formal Concept Analysis on concept of world. Formal concept analysis (FCA) as a methodology of data analysis, information management and knowledge representation has potential to be applied to a verity of linguistic problems. FCA is mathematical theory for concepts and concept hierarchies that reflects an understanding of concept. Formal concept analysis explicitly formalizes extension and intension of a concept, their mutual relationships. A distinguishing feature of FCA is an inherent integration of three components of conceptual processing of data and knowledge, namely, the discovery and reasoning with concepts in data, discovery and reasoning with dependencies in data, and visualization of data, concepts, and dependencies with folding/unfolding capabilities.

  1. Using a computer-based simulation with an artificial intelligence component and discovery learning to formulate training needs for a new technology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hillis, D.R.

    A computer-based simulation with an artificial intelligence component and discovery learning was investigated as a method to formulate training needs for new or unfamiliar technologies. Specifically, the study examined if this simulation method would provide for the recognition of applications and knowledge/skills which would be the basis for establishing training needs. The study also examined the effect of field-dependence/independence on recognition of applications and knowledge/skills. A pretest-posttest control group experimental design involving fifty-eight college students from an industrial technology program was used. The study concluded that the simulation was effective in developing recognition of applications and the knowledge/skills for amore » new or unfamiliar technology. And, the simulation's effectiveness for providing this recognition was not limited by an individual's field-dependence/independence.« less

  2. Semi-automated knowledge discovery: identifying and profiling human trafficking

    NASA Astrophysics Data System (ADS)

    Poelmans, Jonas; Elzinga, Paul; Ignatov, Dmitry I.; Kuznetsov, Sergei O.

    2012-11-01

    We propose an iterative and human-centred knowledge discovery methodology based on formal concept analysis. The proposed approach recognizes the important role of the domain expert in mining real-world enterprise applications and makes use of specific domain knowledge, including human intelligence and domain-specific constraints. Our approach was empirically validated at the Amsterdam-Amstelland police to identify suspects and victims of human trafficking in 266,157 suspicious activity reports. Based on guidelines of the Attorney Generals of the Netherlands, we first defined multiple early warning indicators that were used to index the police reports. Using concept lattices, we revealed numerous unknown human trafficking and loverboy suspects. In-depth investigation by the police resulted in a confirmation of their involvement in illegal activities resulting in actual arrestments been made. Our human-centred approach was embedded into operational policing practice and is now successfully used on a daily basis to cope with the vastly growing amount of unstructured information.

  3. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.

  4. The role of indirect evidence and traditional ecological knowledge in the discovery and description of new ape and monkey species since 1980.

    PubMed

    Rossi, Lorenzo; Gippoliti, Spartaco; Angelici, Francesco Maria

    2018-06-04

    Although empirical data are necessary to describe new species, their discoveries can be guided from the survey of the so-called circumstantial evidence (that indirectly determines the existence or nonexistence of a fact). Yet this type of evidence, generally linked to traditional ecological knowledge (TEK), is often disputed by field biologists due to its uncertain nature and, on account of that, generally untapped by them. To verify this behavior and the utility of circumstantial evidence, we reviewed the existing literature about the species of apes and monkeys described or rediscovered since January 1, 1980 and submitted a poll to the authors. The results show that circumstantial evidence has proved to be useful in 40.5% of the examined cases and point to the possibility that its use could speed up the process at the heart of the discovery and description of new species, an essential step for conservation purposes.

  5. CONSTRUCTING KNOWLEDGE FROM MULTIVARIATE SPATIOTEMPORAL DATA: INTEGRATING GEOGRAPHIC VISUALIZATION WITH KNOWLEDGE DISCOVERY IN DATABASE METHODS. (R825195)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  6. Dewey: How to Make It Work for You

    ERIC Educational Resources Information Center

    Panzer, Michael

    2013-01-01

    As knowledge brokers, librarians are living in interesting times for themselves and libraries. It causes them to wonder sometimes if the traditional tools like the Dewey Decimal Classification (DDC) system can cope with the onslaught of information. The categories provided do not always seem adequate for the knowledge-discovery habits of…

  7. 78 FR 29071 - Assessment of Mediation and Arbitration Procedures

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-05-17

    ... proceeding. Program participants in the new arbitration program will have prior knowledge of the issues to be... final rules, all parties opting into the arbitration program will have full prior knowledge that these... including discovery, the submission of evidence, and the treatment of confidential information, and the...

  8. Streamlining the Discovery, Evaluation, and Integration of Data, Models, and Decision Support Systems: a Big Picture View

    EPA Science Inventory

    21st century environmental problems are wicked and require holistic systems thinking and solutions that integrate social and economic knowledge with knowledge of the environment. Computer-based technologies are fundamental to our ability to research and understand the relevant sy...

  9. Teaching Practice: A Perspective on Inter-Text and Prior Knowledge

    ERIC Educational Resources Information Center

    Costley, Kevin C.; West, Howard G.

    2012-01-01

    The use of teaching practices that involve intertextual relationship discovery in today's elementary classrooms is increasingly essential to the success of young learners of reading. Teachers must constantly strive to expand their perspective of how to incorporate the dialogue included in prior knowledge assessment. Teachers must also consider how…

  10. Globalization of Knowledge Discovery and Information Retrieval in Teaching and Learning

    ERIC Educational Resources Information Center

    Zaidel, Mark; Guerrero, Osiris

    2008-01-01

    Developments in communication and information technologies in the last decade have had a significant impact on instructional and learning activities. For many students and educators, the Internet became the significant medium for sharing instruction, learning and communication. Access to knowledge beyond boundaries and cultures has an impact on…

  11. An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.

    ERIC Educational Resources Information Center

    Trybula, Walter J.; Wyllys, Ronald E.

    2000-01-01

    Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)

  12. Vocational Education Institutions' Role in National Innovation

    ERIC Educational Resources Information Center

    Moodie, Gavin

    2006-01-01

    This article distinguishes research--the discovery of new knowledge--from innovation, which is understood to be the transformation of practice in a community or the incorporation of existing knowledge into economic activity. From a survey of roles served by vocational education institutions in a number of OECD countries the paper argues that…

  13. A Python Geospatial Language Toolkit

    NASA Astrophysics Data System (ADS)

    Fillmore, D.; Pletzer, A.; Galloy, M.

    2012-12-01

    The volume and scope of geospatial data archives, such as collections of satellite remote sensing or climate model products, has been rapidly increasing and will continue to do so in the near future. The recently launched (October 2011) Suomi National Polar-orbiting Partnership satellite (NPP) for instance, is the first of a new generation of Earth observation platforms that will monitor the atmosphere, oceans, and ecosystems, and its suite of instruments will generate several terabytes each day in the form of multi-spectral images and derived datasets. Full exploitation of such data for scientific analysis and decision support applications has become a major computational challenge. Geophysical data exploration and knowledge discovery could benefit, in particular, from intelligent mechanisms for extracting and manipulating subsets of data relevant to the problem of interest. Potential developments include enhanced support for natural language queries and directives to geospatial datasets. The translation of natural language (that is, human spoken or written phrases) into complex but unambiguous objects and actions can be based on a context, or knowledge domain, that represents the underlying geospatial concepts. This poster describes a prototype Python module that maps English phrases onto basic geospatial objects and operations. This module, along with the associated computational geometry methods, enables the resolution of natural language directives that include geographic regions of arbitrary shape and complexity.

  14. Nanoinformatics knowledge infrastructures: bringing efficient information management to nanomedical research

    PubMed Central

    de la Iglesia, D; Cachau, R E; García-Remesal, M; Maojo, V

    2014-01-01

    Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts. PMID:24932210

  15. Using Data Mining to Detect Health Care Fraud and Abuse: A Review of Literature

    PubMed Central

    Joudaki, Hossein; Rashidian, Arash; Minaei-Bidgoli, Behrouz; Mahmoodi, Mahmood; Geraili, Bijan; Nasiri, Mahdi; Arab, Mohammad

    2015-01-01

    Inappropriate payments by insurance organizations or third party payers occur because of errors, abuse and fraud. The scale of this problem is large enough to make it a priority issue for health systems. Traditional methods of detecting health care fraud and abuse are time-consuming and inefficient. Combining automated methods and statistical knowledge lead to the emergence of a new interdisciplinary branch of science that is named Knowledge Discovery from Databases (KDD). Data mining is a core of the KDD process. Data mining can help third-party payers such as health insurance organizations to extract useful information from thousands of claims and identify a smaller subset of the claims or claimants for further assessment. We reviewed studies that performed data mining techniques for detecting health care fraud and abuse, using supervised and unsupervised data mining approaches. Most available studies have focused on algorithmic data mining without an emphasis on or application to fraud detection efforts in the context of health service provision or health insurance policy. More studies are needed to connect sound and evidence-based diagnosis and treatment approaches toward fraudulent or abusive behaviors. Ultimately, based on available studies, we recommend seven general steps to data mining of health care claims. PMID:25560347

  16. Reflective practice and guided discovery: clinical supervision.

    PubMed

    Todd, G; Freshwater, D

    This article explores the parallels between reflective practice as a model for clinical supervision, and guided discovery as a skill in cognitive psychotherapy. A description outlining the historical development of clinical supervision in relationship to positional papers and policies is followed by an exposé of the difficulties in developing a clear, consistent model of clinical supervision with a coherent focus; reflective practice is proposed as a model of choice for clinical supervision in nursing. The article examines the parallels and processes of a model of reflection in an individual clinical supervision session, and the use of guided discovery through Socratic dialogue with a depressed patient in cognitive psychotherapy. Extracts from both sessions are used to illuminate the subsequent discussion.

  17. KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process

    NASA Technical Reports Server (NTRS)

    Gettig, Gary A.

    1988-01-01

    Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.

  18. Automated In Vivo Platform for the Discovery of Functional Food Treatments of Hypercholesterolemia

    PubMed Central

    Littleton, Robert M.; Haworth, Kevin J.; Tang, Hong; Setchell, Kenneth D. R.; Nelson, Sandra; Hove, Jay R.

    2013-01-01

    The zebrafish is becoming an increasingly popular model system for both automated drug discovery and investigating hypercholesterolemia. Here we combine these aspects and for the first time develop an automated high-content confocal assay for treatments of hypercholesterolemia. We also create two algorithms for automated analysis of cardiodynamic data acquired by high-speed confocal microscopy. The first algorithm computes cardiac parameters solely from the frequency-domain representation of cardiodynamic data while the second uses both frequency- and time-domain data. The combined approach resulted in smaller differences relative to manual measurements. The methods are implemented to test the ability of a methanolic extract of the hawthorn plant (Crataegus laevigata) to treat hypercholesterolemia and its peripheral cardiovascular effects. Results demonstrate the utility of these methods and suggest the extract has both antihypercholesterolemic and postitively inotropic properties. PMID:23349685

  19. Automated in vivo platform for the discovery of functional food treatments of hypercholesterolemia.

    PubMed

    Littleton, Robert M; Haworth, Kevin J; Tang, Hong; Setchell, Kenneth D R; Nelson, Sandra; Hove, Jay R

    2013-01-01

    The zebrafish is becoming an increasingly popular model system for both automated drug discovery and investigating hypercholesterolemia. Here we combine these aspects and for the first time develop an automated high-content confocal assay for treatments of hypercholesterolemia. We also create two algorithms for automated analysis of cardiodynamic data acquired by high-speed confocal microscopy. The first algorithm computes cardiac parameters solely from the frequency-domain representation of cardiodynamic data while the second uses both frequency- and time-domain data. The combined approach resulted in smaller differences relative to manual measurements. The methods are implemented to test the ability of a methanolic extract of the hawthorn plant (Crataegus laevigata) to treat hypercholesterolemia and its peripheral cardiovascular effects. Results demonstrate the utility of these methods and suggest the extract has both antihypercholesterolemic and postitively inotropic properties.

  20. Exploring relation types for literature-based discovery.

    PubMed

    Preiss, Judita; Stevenson, Mark; Gaizauskas, Robert

    2015-09-01

    Literature-based discovery (LBD) aims to identify "hidden knowledge" in the medical literature by: (1) analyzing documents to identify pairs of explicitly related concepts (terms), then (2) hypothesizing novel relations between pairs of unrelated concepts that are implicitly related via a shared concept to which both are explicitly related. Many LBD approaches use simple techniques to identify semantically weak relations between concepts, for example, document co-occurrence. These generate huge numbers of hypotheses, difficult for humans to assess. More complex techniques rely on linguistic analysis, for example, shallow parsing, to identify semantically stronger relations. Such approaches generate fewer hypotheses, but may miss hidden knowledge. The authors investigate this trade-off in detail, comparing techniques for identifying related concepts to discover which are most suitable for LBD. A generic LBD system that can utilize a range of relation types was developed. Experiments were carried out comparing a number of techniques for identifying relations. Two approaches were used for evaluation: replication of existing discoveries and the "time slicing" approach.(1) RESULTS: Previous LBD discoveries could be replicated using relations based either on document co-occurrence or linguistic analysis. Using relations based on linguistic analysis generated many fewer hypotheses, but a significantly greater proportion of them were candidates for hidden knowledge. The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage. LBD systems often generate huge numbers of hypotheses, which are infeasible to manually review. Improving their accuracy has the potential to make these systems significantly more usable. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  1. An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems

    PubMed Central

    Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.

    2014-01-01

    Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544

  2. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples

    PubMed Central

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-01-01

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007

  3. A framework for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps

    NASA Astrophysics Data System (ADS)

    Xu, Jin; Li, Zheng; Li, Shuliang; Zhang, Yanyan

    2015-07-01

    There is still a lack of effective paradigms and tools for analysing and discovering the contents and relationships of project knowledge contexts in the field of project management. In this paper, a new framework for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps under big data environments is proposed and developed. The conceptual paradigm, theoretical underpinning, extended topic model, and illustration examples of the ontology model for project knowledge maps are presented, with further research work envisaged.

  4. UltiMatch-NL: A Web Service Matchmaker Based on Multiple Semantic Filters

    PubMed Central

    Mohebbi, Keyvan; Ibrahim, Suhaimi; Zamani, Mazdak; Khezrian, Mojtaba

    2014-01-01

    In this paper, a Semantic Web service matchmaker called UltiMatch-NL is presented. UltiMatch-NL applies two filters namely Signature-based and Description-based on different abstraction levels of a service profile to achieve more accurate results. More specifically, the proposed filters rely on semantic knowledge to extract the similarity between a given pair of service descriptions. Thus it is a further step towards fully automated Web service discovery via making this process more semantic-aware. In addition, a new technique is proposed to weight and combine the results of different filters of UltiMatch-NL, automatically. Moreover, an innovative approach is introduced to predict the relevance of requests and Web services and eliminate the need for setting a threshold value of similarity. In order to evaluate UltiMatch-NL, the repository of OWLS-TC is used. The performance evaluation based on standard measures from the information retrieval field shows that semantic matching of OWL-S services can be significantly improved by incorporating designed matching filters. PMID:25157872

  5. Assessing semantic similarity of texts - Methods and algorithms

    NASA Astrophysics Data System (ADS)

    Rozeva, Anna; Zerkova, Silvia

    2017-12-01

    Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.

  6. Interactive Visualization of Large-Scale Hydrological Data using Emerging Technologies in Web Systems and Parallel Programming

    NASA Astrophysics Data System (ADS)

    Demir, I.; Krajewski, W. F.

    2013-12-01

    As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.

  7. UltiMatch-NL: a Web service matchmaker based on multiple semantic filters.

    PubMed

    Mohebbi, Keyvan; Ibrahim, Suhaimi; Zamani, Mazdak; Khezrian, Mojtaba

    2014-01-01

    In this paper, a Semantic Web service matchmaker called UltiMatch-NL is presented. UltiMatch-NL applies two filters namely Signature-based and Description-based on different abstraction levels of a service profile to achieve more accurate results. More specifically, the proposed filters rely on semantic knowledge to extract the similarity between a given pair of service descriptions. Thus it is a further step towards fully automated Web service discovery via making this process more semantic-aware. In addition, a new technique is proposed to weight and combine the results of different filters of UltiMatch-NL, automatically. Moreover, an innovative approach is introduced to predict the relevance of requests and Web services and eliminate the need for setting a threshold value of similarity. In order to evaluate UltiMatch-NL, the repository of OWLS-TC is used. The performance evaluation based on standard measures from the information retrieval field shows that semantic matching of OWL-S services can be significantly improved by incorporating designed matching filters.

  8. Discovering Peripheral Arterial Disease Cases from Radiology Notes Using Natural Language Processing

    PubMed Central

    Savova, Guergana K.; Fan, Jin; Ye, Zi; Murphy, Sean P.; Zheng, Jiaping; Chute, Christopher G.; Kullo, Iftikhar J.

    2010-01-01

    As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo’s Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93–0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90’s. The positive predictive value for the positive and unknown categories was in the high 90’s, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements. PMID:21347073

  9. Discovering Structural Regularity in 3D Geometry

    PubMed Central

    Pauly, Mark; Mitra, Niloy J.; Wallner, Johannes; Pottmann, Helmut; Guibas, Leonidas J.

    2010-01-01

    We introduce a computational framework for discovering regular or repeated geometric structures in 3D shapes. We describe and classify possible regular structures and present an effective algorithm for detecting such repeated geometric patterns in point- or mesh-based models. Our method assumes no prior knowledge of the geometry or spatial location of the individual elements that define the pattern. Structure discovery is made possible by a careful analysis of pairwise similarity transformations that reveals prominent lattice structures in a suitable model of transformation space. We introduce an optimization method for detecting such uniform grids specifically designed to deal with outliers and missing elements. This yields a robust algorithm that successfully discovers complex regular structures amidst clutter, noise, and missing geometry. The accuracy of the extracted generating transformations is further improved using a novel simultaneous registration method in the spatial domain. We demonstrate the effectiveness of our algorithm on a variety of examples and show applications to compression, model repair, and geometry synthesis. PMID:21170292

  10. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

  11. Computational methods in drug discovery

    PubMed Central

    Leelananda, Sumudu P

    2016-01-01

    The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein–ligand docking, pharmacophore modeling and QSAR techniques are reviewed. PMID:28144341

  12. Computational methods in drug discovery.

    PubMed

    Leelananda, Sumudu P; Lindert, Steffen

    2016-01-01

    The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.

  13. An interactive visualization tool for mobile objects

    NASA Astrophysics Data System (ADS)

    Kobayashi, Tetsuo

    Recent advancements in mobile devices---such as Global Positioning System (GPS), cellular phones, car navigation system, and radio-frequency identification (RFID)---have greatly influenced the nature and volume of data about individual-based movement in space and time. Due to the prevalence of mobile devices, vast amounts of mobile objects data are being produced and stored in databases, overwhelming the capacity of traditional spatial analytical methods. There is a growing need for discovering unexpected patterns, trends, and relationships that are hidden in the massive mobile objects data. Geographic visualization (GVis) and knowledge discovery in databases (KDD) are two major research fields that are associated with knowledge discovery and construction. Their major research challenges are the integration of GVis and KDD, enhancing the ability to handle large volume mobile objects data, and high interactivity between the computer and users of GVis and KDD tools. This dissertation proposes a visualization toolkit to enable highly interactive visual data exploration for mobile objects datasets. Vector algebraic representation and online analytical processing (OLAP) are utilized for managing and querying the mobile object data to accomplish high interactivity of the visualization tool. In addition, reconstructing trajectories at user-defined levels of temporal granularity with time aggregation methods allows exploration of the individual objects at different levels of movement generality. At a given level of generality, individual paths can be combined into synthetic summary paths based on three similarity measures, namely, locational similarity, directional similarity, and geometric similarity functions. A visualization toolkit based on the space-time cube concept exploits these functionalities to create a user-interactive environment for exploring mobile objects data. Furthermore, the characteristics of visualized trajectories are exported to be utilized for data mining, which leads to the integration of GVis and KDD. Case studies using three movement datasets (personal travel data survey in Lexington, Kentucky, wild chicken movement data in Thailand, and self-tracking data in Utah) demonstrate the potential of the system to extract meaningful patterns from the otherwise difficult to comprehend collections of space-time trajectories.

  14. Aflatoxin control--how a regulatory agency managed risk from an unavoidable natural toxicant in food and feed.

    PubMed

    Park, D L; Stoloff, L

    1989-04-01

    The control by the Food and Drug Administration (FDA) of aflatoxin, a relatively recently discovered, unavoidable natural contaminant produced by specific molds that invade a number of basic food and feedstuffs, provides an example of the varying forces that affect risk assessment and management by a regulatory Agency. This is the story of how the FDA responded to the initial discovery of a potential carcinogenic hazard to humans in a domestic commodity, to the developing information concerning the nature of the hazard, to the economic and political pressures that are created by the impact of natural forces on regulatory controls, and to the restraints of laws within which the Agency must work. This story covers four periods: the years of discovery and action decisions on the basis of meager knowledge and the fear of cancer; the years of tinkering on paper with the regulatory process, the years of digestion of the accumulating knowledge, and the application of that knowledge to actions forced by natural events; and an audit of the current status of knowledge about the hazard from aflatoxin, and proposals for regulatory control based on that knowledge.

  15. Discovery and development of anticancer agents from marine sponges: perspectives based on a chemistry-experimental therapeutics collaborative program.

    PubMed

    Valeriote, Frederick A; Tenney, Karen; Media, Joseph; Pietraszkiewicz, Halina; Edelstein, Matthew; Johnson, Tyler A; Amagata, Taro; Crews, Phillip

    2012-01-01

    A collaborative program was initiated in 1990 between the natural product chemistry laboratory of Dr. Phillip Crews at the University of California Santa Cruz and the experimental therapeutics laboratory of Dr. Fred Valeriote at the Henry Ford Hospital in Detroit. The program focused on the discovery and development of anticancer drugs from sponge extracts. A novel in vitro disk diffusion, solid tumor selective assay was used to examine 2,036 extracts from 683 individual sponges. The bioassay-directed fractionation discovery component led to the identification of active pure compounds from many of these sponges. In most cases, pure compound was prepared in sufficient quantities to both chemically identify the active compound(s) as well as pursue one or more of the biological development components. The latter included IC50, clonogenic survival-concentration exposure, maximum tolerated dose, pharmacokinetics and therapeutic assessment studies. Solid tumor selective compounds included fascaplysin and 10-bromofascaplysin (Fascaplysinopsis), neoamphimedine, 5-methoxyneoamphimedine and alpkinidine (Xestospongia), makaluvamine C and makaluvamine H (Zyzzya), psymberin (Psammocinia and Ircinia), and ethylplakortide Z and ethyldidehydroplakortide Z (Plakortis). These compounds or analogs thereof continue to have therapeutic potential.

  16. Postgenomic strategies in antibacterial drug discovery.

    PubMed

    Brötz-Oesterhelt, Heike; Sass, Peter

    2010-10-01

    During the last decade the field of antibacterial drug discovery has changed in many aspects including bacterial organisms of primary interest, discovery strategies applied and pharmaceutical companies involved. Target-based high-throughput screening had been disappointingly unsuccessful for antibiotic research. Understanding of this lack of success has increased substantially and the lessons learned refer to characteristics of targets, screening libraries and screening strategies. The 'genomics' approach was replaced by a diverse array of discovery strategies, for example, searching for new natural product leads among previously abandoned compounds or new microbial sources, screening for synthetic inhibitors by targeted approaches including structure-based design and analyses of focused libraries and designing resistance-breaking properties into antibiotics of established classes. Furthermore, alternative treatment options are being pursued including anti-virulence strategies and immunotherapeutic approaches. This article summarizes the lessons learned from the genomics era and describes discovery strategies resulting from that knowledge.

  17. Priority of discovery in the life sciences

    PubMed Central

    Vale, Ronald D; Hyman, Anthony A

    2016-01-01

    The job of a scientist is to make a discovery and then communicate this new knowledge to others. For a scientist to be successful, he or she needs to be able to claim credit or priority for discoveries throughout their career. However, despite being fundamental to the reward system of science, the principles for establishing the "priority of discovery" are rarely discussed. Here we break down priority into two steps: disclosure, in which the discovery is released to the world-wide community; and validation, in which other scientists assess the accuracy, quality and importance of the work. Currently, in biology, disclosure and an initial validation are combined in a journal publication. Here, we discuss the advantages of separating these steps into disclosure via a preprint, and validation via a combination of peer review at a journal and additional evaluation by the wider scientific community. PMID:27310529

  18. TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.

    PubMed

    Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana

    2018-04-05

    A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .

  19. Inhibitory activities of selected Sudanese medicinal plants on Porphyromonas gingivalis and matrix metalloproteinase-9 and isolation of bioactive compounds from Combretum hartmannianum (Schweinf) bark.

    PubMed

    Mohieldin, Ebtihal Abdalla M; Muddathir, Ali Mahmoud; Mitsunaga, Tohru

    2017-04-20

    Periodontal diseases are one of the major health problems and among the most important preventable global infectious diseases. Porphyromonas gingivalis is an anaerobic Gram-negative bacterium which has been strongly implicated in the etiology of periodontitis. Additionally, matrix metalloproteinases-9 (MMP-9) is an important factor contributing to periodontal tissue destruction by a variety of mechanisms. The purpose of this study was to evaluate the selected Sudanese medicinal plants against P. gingivalis bacteria and their inhibitory activities on MMP-9. Sixty two methanolic and 50% ethanolic extracts from 24 plants species were tested for antibacterial activity against P. gingivalis using microplate dilution assay method to determine the minimum inhibitory concentration (MIC). The inhibitory activity of seven methanol extracts selected from the 62 extracts against MMP-9 was determined by Colorimetric Drug Discovery Kit. In search of bioactive lead compounds, Combretum hartmannianum bark which was found to be within the most active plant extracts was subjected to various chromatographic (medium pressure liquid chromatography, column chromatography on a Sephadex LH-20, preparative high performance liquid chromatography) and spectroscopic methods (liquid chromatography-mass spectrometry, Nuclear Magnetic Resonance (NMR)) to isolate and characterize flavogalonic acid dilactone and terchebulin as bioactive compounds. About 80% of the crude extracts provided a MIC value ≤4 mg/ml against bacteria. The extracts which revealed the highest potency were: methanolic extracts of Terminalia laxiflora (wood; MIC = 0.25 mg/ml) followed by Acacia totrtilis (bark), Ambrosia maritima (aerial part), Argemone mexicana (seed), C. hartmannianum (bark), Terminalia brownii (wood) and 50% ethanolic extract of T. brownii (bark) with MIC values of 0.5 mg/ml. T. laxiflora (wood) and C. hartmannianum (bark) which belong to combretaceae family showed an inhibitory activity over 50% at the concentration of 10 μg/ml against MMP-9. Additionally, MMP-9 was significantly inhibited by terchebulin with IC 50 value of 6.7 μM. To the best of our knowledge, flavogalonic acid dilactone and terchebulin were isolated from C. hartmannianium bark for the first time in this study. Because of terchebulin and some crude extracts acting on P. gingivalis bacteria and MMP-9 enzyme that would make them promising natural preference for preventing and treating periodontal diseases.

  20. Computational biology for cardiovascular biomarker discovery.

    PubMed

    Azuaje, Francisco; Devaux, Yvan; Wagner, Daniel

    2009-07-01

    Computational biology is essential in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena based on the resources and technologies originating from the clinical environment. One such key contribution of computational biology is the discovery of biomarkers for predicting clinical outcomes using 'omic' information. This process involves the predictive modelling and integration of different types of data and knowledge for screening, diagnostic or prognostic purposes. Moreover, this requires the design and combination of different methodologies based on statistical analysis and machine learning. This article introduces key computational approaches and applications to biomarker discovery based on different types of 'omic' data. Although we emphasize applications in cardiovascular research, the computational requirements and advances discussed here are also relevant to other domains. We will start by introducing some of the contributions of computational biology to translational research, followed by an overview of methods and technologies used for the identification of biomarkers with predictive or classification value. The main types of 'omic' approaches to biomarker discovery will be presented with specific examples from cardiovascular research. This will include a review of computational methodologies for single-source and integrative data applications. Major computational methods for model evaluation will be described together with recommendations for reporting models and results. We will present recent advances in cardiovascular biomarker discovery based on the combination of gene expression and functional network analyses. The review will conclude with a discussion of key challenges for computational biology, including perspectives from the biosciences and clinical areas.

  1. High-throughput strategies for the discovery and engineering of enzymes for biocatalysis.

    PubMed

    Jacques, Philippe; Béchet, Max; Bigan, Muriel; Caly, Delphine; Chataigné, Gabrielle; Coutte, François; Flahaut, Christophe; Heuson, Egon; Leclère, Valérie; Lecouturier, Didier; Phalip, Vincent; Ravallec, Rozenn; Dhulster, Pascal; Froidevaux, Rénato

    2017-02-01

    Innovations in novel enzyme discoveries impact upon a wide range of industries for which biocatalysis and biotransformations represent a great challenge, i.e., food industry, polymers and chemical industry. Key tools and technologies, such as bioinformatics tools to guide mutant library design, molecular biology tools to create mutants library, microfluidics/microplates, parallel miniscale bioreactors and mass spectrometry technologies to create high-throughput screening methods and experimental design tools for screening and optimization, allow to evolve the discovery, development and implementation of enzymes and whole cells in (bio)processes. These technological innovations are also accompanied by the development and implementation of clean and sustainable integrated processes to meet the growing needs of chemical, pharmaceutical, environmental and biorefinery industries. This review gives an overview of the benefits of high-throughput screening approach from the discovery and engineering of biocatalysts to cell culture for optimizing their production in integrated processes and their extraction/purification.

  2. The top quark (20 years after the discovery)

    DOE PAGES

    Boos, Eduard; Brandt, Oleg; Denisov, Dmitri; ...

    2015-09-10

    On the twentieth anniversary of the observation of the top quark, we trace our understanding of this heaviest of all known particles from the prediction of its existence, through the searches and discovery, to the current knowledge of its production mechanisms and properties. We also discuss the central role of the top quark in the Standard Model and the windows that it opens for seeking new physics beyond the Standard Model.

  3. Facilitating knowledge discovery and visualization through mining contextual data from published studies: lessons from JournalMap

    USDA-ARS?s Scientific Manuscript database

    Valuable information on the location and context of ecological studies are locked up in publications in myriad formats that are not easily machine readable. This presents significant challenges to building geographic-based tools to search for and visualize sources of ecological knowledge. JournalMap...

  4. 77 FR 11345 - Harmonization of Compliance Obligations for Registered Investment Companies Required To Register...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-02-24

    ... his or her knowledge and belief, the information contained in the document is accurate and complete. The first item in the certification required by SEC Form N-CSR is: ``Based on my knowledge, this..., competitiveness and financial integrity of futures markets; (3) price discovery; (4) sound risk management...

  5. Incremental Knowledge Discovery in Social Media

    ERIC Educational Resources Information Center

    Tang, Xuning

    2013-01-01

    In light of the prosperity of online social media, Web users are shifting from data consumers to data producers. To catch the pulse of this rapidly changing world, it is critical to transform online social media data to information and to knowledge. This dissertation centers on the issue of modeling the dynamics of user communities, trending…

  6. Effects of Students' Prior Knowledge on Scientific Reasoning in Density.

    ERIC Educational Resources Information Center

    Yang, Il-Ho; Kwon, Yong-Ju; Kim, Young-Shin; Jang, Myoung-Duk; Jeong, Jin-Woo; Park, Kuk-Tae

    2002-01-01

    Investigates the effects of students' prior knowledge on the scientific reasoning processes of performing the task of controlling variables with computer simulation and identifies a number of problems that students encounter in scientific discovery. Involves (n=27) 5th grade students and (n=33) 7th grade students. Indicates that students' prior…

  7. Sea buckthorn bud extract displays activity against cell-cultured Influenza virus.

    PubMed

    Torelli, A; Gianchecchi, E; Piccirella, S; Manenti, A; Piccini, G; Llorente Pastor, E; Canovi, B; Montomoli, E

    2015-08-05

    Vaccines and antiviral drugs are the most widely used methods of preventing or treating Influenza virus infection. The role of sea buckthorn (SBT) bud dry extract as a natural antiviral drug against Influenza was investigated. Influenza virus was cultured in the MDCK cell line, with or without SBT bud extract, and virus growth was assessed by HA and TCID50 virus titration in terms of cytopathic effect on cells. Several concentrations of extract were tested, the virus titer being measured on day 4 after infection. After infection, the virus titer in the control sample was calculated to be 2.5 TCID50/ml; treatment with SBT bud extract reduced the virus titer to 2.0 TCID50/ml at 50 μg/ml, while the HA titer was reduced from 1431 (control) to 178. Concentrations lower than 50μg/ml displayed an inhibitory effect in the HA assay, but not in the TCID50 virus titration; however, observation of the viral cultures confirmed a slowdown of viral growth at all concentrations. Natural dietary supplements and phytotherapy are a growing market and offer new opportunities for the treatment of several diseases and disorders. These preliminary experiments are the first to show that SBT bud extract is able to reduce the growth of the Influenza A H1N1 virus in vitro at a concentration of 50 μg/ml. This discovery opens up the possibility of using SBT bud extract as a valid weapon against Influenza and, in addition, as the starting-point for the discovery of new drugs. © Copyright by Pacini Editore SpA, Pisa, Italy.

  8. Developing New Antimicrobial Therapies: Are Synergistic Combinations of Plant Extracts/Compounds with Conventional Antibiotics the Solution?

    PubMed Central

    Cheesman, Matthew J.; Ilanko, Aishwarya; Blonk, Baxter; Cock, Ian E.

    2017-01-01

    The discovery of penicillin nearly 90 years ago revolutionized the treatment of bacterial disease. Since that time, numerous other antibiotics have been discovered from bacteria and fungi, or developed by chemical synthesis and have become effective chemotherapeutic options. However, the misuse of antibiotics has lessened the efficacy of many commonly used antibiotics. The emergence of resistant strains of bacteria has seriously limited our ability to treat bacterial illness, and new antibiotics are desperately needed. Since the discovery of penicillin, most antibiotic development has focused on the discovery of new antibiotics derived from microbial sources, or on the synthesis of new compounds using existing antibiotic scaffolds to the detriment of other lines of discovery. Both of these methods have been fruitful. However, for a number of reasons discussed in this review, these strategies are unlikely to provide the same wealth of new antibiotics in the future. Indeed, the number of newly developed antibiotics has decreased dramatically in recent years. Instead, a reexamination of traditional medicines has become more common and has already provided several new antibiotics. Traditional medicine plants are likely to provide further new antibiotics in the future. However, the use of plant extracts or pure natural compounds in combination with conventional antibiotics may hold greater promise for rapidly providing affordable treatment options. Indeed, some combinational antibiotic therapies are already clinically available. This study reviews the recent literature on combinational antibiotic therapies to highlight their potential and to guide future research in this field. PMID:28989242

  9. Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs.

    PubMed

    Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M

    2011-05-17

    Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  10. Traditional Medicine Collection Tracking System (TM-CTS): A Database for Ethnobotanically-Driven Drug-Discovery Programs

    PubMed Central

    Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.

    2011-01-01

    Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479

  11. Developing New Antimicrobial Therapies: Are Synergistic Combinations of Plant Extracts/Compounds with Conventional Antibiotics the Solution?

    PubMed

    Cheesman, Matthew J; Ilanko, Aishwarya; Blonk, Baxter; Cock, Ian E

    2017-01-01

    The discovery of penicillin nearly 90 years ago revolutionized the treatment of bacterial disease. Since that time, numerous other antibiotics have been discovered from bacteria and fungi, or developed by chemical synthesis and have become effective chemotherapeutic options. However, the misuse of antibiotics has lessened the efficacy of many commonly used antibiotics. The emergence of resistant strains of bacteria has seriously limited our ability to treat bacterial illness, and new antibiotics are desperately needed. Since the discovery of penicillin, most antibiotic development has focused on the discovery of new antibiotics derived from microbial sources, or on the synthesis of new compounds using existing antibiotic scaffolds to the detriment of other lines of discovery. Both of these methods have been fruitful. However, for a number of reasons discussed in this review, these strategies are unlikely to provide the same wealth of new antibiotics in the future. Indeed, the number of newly developed antibiotics has decreased dramatically in recent years. Instead, a reexamination of traditional medicines has become more common and has already provided several new antibiotics. Traditional medicine plants are likely to provide further new antibiotics in the future. However, the use of plant extracts or pure natural compounds in combination with conventional antibiotics may hold greater promise for rapidly providing affordable treatment options. Indeed, some combinational antibiotic therapies are already clinically available. This study reviews the recent literature on combinational antibiotic therapies to highlight their potential and to guide future research in this field.

  12. Advances in the understanding and use of the genomic base of microbial secondary metabolite biosynthesis for the discovery of new natural products.

    PubMed

    McAlpine, James B

    2009-03-27

    Over the past decade major changes have occurred in the access to genome sequences that encode the enzymes responsible for the biosynthesis of secondary metabolites, knowledge of how those sequences translate into the final structure of the metabolite, and the ability to alter the sequence to obtain predicted products via both homologous and heterologous expression. Novel genera have been discovered leading to new chemotypes, but more surprisingly several instances have been uncovered where the apparently general rules of modular translation have not applied. Several new biosynthetic pathways have been unearthed, and our general knowledge grows rapidly. This review aims to highlight some of the more striking discoveries and advances of the decade.

  13. Influenza neuraminidase: a druggable target for natural products.

    PubMed

    Grienke, Ulrike; Schmidtke, Michaela; von Grafenstein, Susanne; Kirchmair, Johannes; Liedl, Klaus R; Rollinger, Judith M

    2012-01-01

    The imminent threat of influenza pandemics and repeatedly reported emergence of new drug-resistant influenza virus strains demonstrate the urgent need for developing innovative and effective antiviral agents for prevention and treatment. At present, influenza neuraminidase (NA), a key enzyme in viral replication, spread, and pathogenesis, is considered to be one of the most promising targets for combating influenza. Despite the substantial medical potential of NA inhibitors (NAIs), only three of these drugs are currently on the market (zanamivir, oseltamivir, and peramivir). Moreover, sudden changes in NAI susceptibility revealed the urgent need in the discovery/identification of novel inhibitors. Nature offers an abundance of biosynthesized compounds comprising chemical scaffolds of high diversity, which present an infinite pool of chemical entities for target-oriented drug discovery in the battle against this highly contagious pathogen. This review illuminates the increasing research efforts of the past decade (2000-2011), focusing on the structure, function and druggability of influenza NA, as well as its inhibition by natural products. Following a critical discussion of publications describing some 150 secondary plant metabolites tested for their inhibitory potential against influenza NA, the impact of three different strategies to identify and develop novel NAIs is presented: (i) bioactivity screening of herbal extracts, (ii) exploitation of empirical knowledge, and (iii) computational approaches. This work addresses the latest developments in theoretical and experimental research on properties of NA that are and will be driving anti-influenza drug development now and in the near future.

  14. BEAM web server: a tool for structural RNA motif discovery.

    PubMed

    Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2018-03-15

    RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.

  15. New Method for Knowledge Management Focused on Communication Pattern in Product Development

    NASA Astrophysics Data System (ADS)

    Noguchi, Takashi; Shiba, Hajime

    In the field of manufacturing, the importance of utilizing knowledge and know-how has been growing. To meet this background, there is a need for new methods to efficiently accumulate and extract effective knowledge and know-how. To facilitate the extraction of knowledge and know-how needed by engineers, we first defined business process information which includes schedule/progress information, document data, information about communication among parties concerned, and information which corresponds to these three types of information. Based on our definitions, we proposed an IT system (FlexPIM: Flexible and collaborative Process Information Management) to register and accumulate business process information with the least effort. In order to efficiently extract effective information from huge volumes of accumulated business process information, focusing attention on “actions” and communication patterns, we propose a new extraction method using communication patterns. And the validity of this method has been verified for some communication patterns.

  16. Modelling and enhanced molecular dynamics to steer structure-based drug discovery.

    PubMed

    Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe

    2014-05-01

    The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. MSClique: Multiple Structure Discovery through the Maximum Weighted Clique Problem.

    PubMed

    Sanroma, Gerard; Penate-Sanchez, Adrian; Alquézar, René; Serratosa, Francesc; Moreno-Noguer, Francesc; Andrade-Cetto, Juan; González Ballester, Miguel Ángel

    2016-01-01

    We present a novel approach for feature correspondence and multiple structure discovery in computer vision. In contrast to existing methods, we exploit the fact that point-sets on the same structure usually lie close to each other, thus forming clusters in the image. Given a pair of input images, we initially extract points of interest and extract hierarchical representations by agglomerative clustering. We use the maximum weighted clique problem to find the set of corresponding clusters with maximum number of inliers representing the multiple structures at the correct scales. Our method is parameter-free and only needs two sets of points along with their tentative correspondences, thus being extremely easy to use. We demonstrate the effectiveness of our method in multiple-structure fitting experiments in both publicly available and in-house datasets. As shown in the experiments, our approach finds a higher number of structures containing fewer outliers compared to state-of-the-art methods.

  18. A bilateral integrative health-care knowledge service mechanism based on 'MedGrid'.

    PubMed

    Liu, Chao; Jiang, Zuhua; Zhen, Lu; Su, Hai

    2008-04-01

    Current health-care organizations are encountering impression of paucity of medical knowledge. This paper classifies medical knowledge with new scopes. The discovery of health-care 'knowledge flow' initiates a bilateral integrative health-care knowledge service, and we make medical knowledge 'flow' around and gain comprehensive effectiveness through six operations (such as knowledge refreshing...). Seizing the active demand of Chinese health-care revolution, this paper presents 'MedGrid', which is a platform with medical ontology and knowledge contents service. Each level and detailed contents are described on MedGrid info-structure. Moreover, a new diagnosis and treatment mechanism are formed by technically connecting with electronic health-care records (EHRs).

  19. Harnessing the potential of natural products in drug discovery from a cheminformatics vantage point.

    PubMed

    Rodrigues, Tiago

    2017-11-15

    Natural products (NPs) present a privileged source of inspiration for chemical probe and drug design. Despite the biological pre-validation of the underlying molecular architectures and their relevance in drug discovery, the poor accessibility to NPs, complexity of the synthetic routes and scarce knowledge of their macromolecular counterparts in phenotypic screens still hinder their broader exploration. Cheminformatics algorithms now provide a powerful means of circumventing the abovementioned challenges and unlocking the full potential of NPs in a drug discovery context. Herein, I discuss recent advances in the computer-assisted design of NP mimics and how artificial intelligence may accelerate future NP-inspired molecular medicine.

  20. An information extraction framework for cohort identification using electronic health records.

    PubMed

    Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G

    2013-01-01

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.

  1. 32 CFR 34.2 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... increasing knowledge or understanding in science and engineering. Applied research is defined as efforts that attempt to determine and exploit the potential of scientific discoveries or improvements in technology...

  2. 76 FR 4452 - Privacy Act of 1974; Report of Modified or Altered System of Records

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-25

    ... Disease Control and Prevention (CDC) for more complete knowledge of the disease/condition in the following... the light of future discoveries and proven associations so that relevant data collected at the time of... professional staff at the Centers for Disease Control and Prevention (CDC) for more complete knowledge of the...

  3. Trying to Teach Well: A Story of Small Discoveries

    ERIC Educational Resources Information Center

    Lewis, P. J.

    2004-01-01

    ''Stories do not simply contain knowledge, they are themselves the knowledge'' (Jackson (In: K. Eagan, H. McEwan (Eds.), Narrative in Teaching, Learning and Research, Teacher College Press, New York, 1995, p. 5)). How can we teach well? Perhaps we can find answers through our stories from the classroom. It is through our stories that we make sense…

  4. Knowledge Representation and Data Mining of Neuronal Morphologies Using Neuroinformatics Tools and Formal Ontologies

    ERIC Educational Resources Information Center

    Polavaram, Sridevi

    2016-01-01

    Neuroscience can greatly benefit from using novel methods in computer science and informatics, which enable knowledge discovery in unexpected ways. Currently one of the biggest challenges in Neuroscience is to map the functional circuitry of the brain. The applications of this goal range from understanding structural reorganization of neurons to…

  5. Knowledge Translation versus Knowledge Integration: A "Funder's" Perspective

    ERIC Educational Resources Information Center

    Kerner, Jon F.

    2006-01-01

    Each year, billions of US tax dollars are spent on basic discovery, intervention development, and efficacy research, while hundreds of billions of US tax dollars are also spent on health service delivery programs. However, little is spent on or known about how best to ensure that the lessons learned from science inform and improve the quality of…

  6. The Assessment of Self-Directed Learning Readiness in Medical Education

    ERIC Educational Resources Information Center

    Monroe, Katherine Swint

    2014-01-01

    The rapid pace of scientific discovery has catalyzed the need for medical students to be able to find and assess new information. The knowledge required for physicians' skillful practice will change of the course of their careers, and, to keep up, they must be able to recognized their deficiencies, search for new knowledge, and critically evaluate…

  7. EPA Web Taxonomy

    EPA Pesticide Factsheets

    EPA's Web Taxonomy is a faceted hierarchical vocabulary used to tag web pages with terms from a controlled vocabulary. Tagging enables search and discovery of EPA's Web based information assests. EPA's Web Taxonomy is being provided in Simple Knowledge Organization System (SKOS) format. SKOS is a standard for sharing and linking knowledge organization systems that promises to make Federal terminology resources more interoperable.

  8. First Discovery of Acetone Extract from Cottonseed Oil Sludge as a Novel Antiviral Agent against Plant Viruses

    PubMed Central

    Zhao, Lei; Feng, Chaohong; Hou, Caiting; Hu, Lingyun; Wang, Qiaochun; Wu, Yunfeng

    2015-01-01

    A novel acetone extract from cottonseed oil sludge was firstly discovered against plant viruses including Tobacco mosaic virus (TMV), Rice stripe virus (RSV) and Southern rice black streaked dwarf virus (SRBSDV). Gossypol and β-sitosterol separated from the acetone extract were tested for their effects on anti-TMV and analysed by nuclear magnetic resonance (NMR) assay. In vivo and field trials in different geographic distributions and different host varieties declared that this extract mixture was more efficient than the commercial agent Ningnanmycin with a broad spectrum of anti-plant-viruses activity. No phytotoxic activity was observed in the treated plants and environmental toxicology showed that this new acetone extract was environmentally friendly, indicating that this acetone extract has potential application in the control of plant virus in the future. PMID:25705894

  9. First discovery of acetone extract from cottonseed oil sludge as a novel antiviral agent against plant viruses.

    PubMed

    Zhao, Lei; Feng, Chaohong; Hou, Caiting; Hu, Lingyun; Wang, Qiaochun; Wu, Yunfeng

    2015-01-01

    A novel acetone extract from cottonseed oil sludge was firstly discovered against plant viruses including Tobacco mosaic virus (TMV), Rice stripe virus (RSV) and Southern rice black streaked dwarf virus (SRBSDV). Gossypol and β-sitosterol separated from the acetone extract were tested for their effects on anti-TMV and analysed by nuclear magnetic resonance (NMR) assay. In vivo and field trials in different geographic distributions and different host varieties declared that this extract mixture was more efficient than the commercial agent Ningnanmycin with a broad spectrum of anti-plant-viruses activity. No phytotoxic activity was observed in the treated plants and environmental toxicology showed that this new acetone extract was environmentally friendly, indicating that this acetone extract has potential application in the control of plant virus in the future.

  10. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  11. Assessment of Anti-Influenza Activity and Hemagglutination Inhibition of Plumbago indica and Allium sativum Extracts

    PubMed Central

    Chavan, Rahul Dilip; Shinde, Pramod; Girkar, Kaustubh; Madage, Rajendra; Chowdhary, Abhay

    2016-01-01

    Background: Human influenza is a seasonal disease associated with significant morbidity and mortality. Anti-flu ayurvedic/herbal medicines have played a significant role in fighting the virus pandemic. Plumbagin and allicin are commonly used ingredients in many therapeutic remedies, either alone or in conjunction with other natural substances. Evidence suggests that these extracts are associated with a variety of pharmacological activities. Objective: To evaluate anti-influenza activity from Plumbago indica and Allium sativum extract against Influenza A (H1N1)pdm09. Materials and Methods: Different extraction procedures were used to isolate the active ingredient in the solvent system, and quantitative HPLTC confirms the presence of plumbagin and allicin. The cytotoxicity was carried out on Madin-Darby Canine kidney cells, and the 50% cytotoxic concentration (CC50) values were below 20 mg/mL for both plant extracts. To assess the anti-influenza activity, two assays were employed, simultaneous and posttreatment assay. Results: A. sativum methanolic and ethanolic extracts showed only 14% reduction in hemagglutination in contrast to P. indica which exhibited 100% reduction in both simultaneous and posttreatment assay at concentrations of 10 mg/mL, 5 mg/mL, and 1 mg/mL. Conclusions: Our results suggest that P. indica extracts are good candidates for anti-influenza therapy and should be used in medical treatment after further research. SUMMARY The search for natural antiviral compounds from plants is a promising approach in the development of new therapeutic agents. In the past century, several scientific efforts have been directed toward identifying phytochemicals capable of inhibiting virus. Knowledge of ethnopharmacology can lead to new bioactive plant compounds suitable for drug discovery and development. Macromolecular docking studies provides most detailed possible view of drug-receptor interaction where the structure of drug is designed based on its fit to three dimensional structures of receptor site rather than by analogy to other active structures or random leads. Our previous studies indicate that Allicin sand Plumbagin could be used as the potent multi drug targets against the Neuraminidase, Hemagglutinin and M2 protein channel of influenza A (H1N1) pdm09. This in-vittro study has shown that P. indica L. and A. sativum extracts can inhibit influenza A (H1N1)pdm09 virus by inhibiting viral nucleoprotein synthesis and polymerase activity. PMID:27034600

  12. Assessment of Anti-Influenza Activity and Hemagglutination Inhibition of Plumbago indica and Allium sativum Extracts.

    PubMed

    Chavan, Rahul Dilip; Shinde, Pramod; Girkar, Kaustubh; Madage, Rajendra; Chowdhary, Abhay

    2016-01-01

    Human influenza is a seasonal disease associated with significant morbidity and mortality. Anti-flu ayurvedic/herbal medicines have played a significant role in fighting the virus pandemic. Plumbagin and allicin are commonly used ingredients in many therapeutic remedies, either alone or in conjunction with other natural substances. Evidence suggests that these extracts are associated with a variety of pharmacological activities. To evaluate anti-influenza activity from Plumbago indica and Allium sativum extract against Influenza A (H1N1)pdm09. Different extraction procedures were used to isolate the active ingredient in the solvent system, and quantitative HPLTC confirms the presence of plumbagin and allicin. The cytotoxicity was carried out on Madin-Darby Canine kidney cells, and the 50% cytotoxic concentration (CC50) values were below 20 mg/mL for both plant extracts. To assess the anti-influenza activity, two assays were employed, simultaneous and posttreatment assay. A. sativum methanolic and ethanolic extracts showed only 14% reduction in hemagglutination in contrast to P. indica which exhibited 100% reduction in both simultaneous and posttreatment assay at concentrations of 10 mg/mL, 5 mg/mL, and 1 mg/mL. Our results suggest that P. indica extracts are good candidates for anti-influenza therapy and should be used in medical treatment after further research. The search for natural antiviral compounds from plants is a promising approach in the development of new therapeutic agents. In the past century, several scientific efforts have been directed toward identifying phytochemicals capable of inhibiting virus. Knowledge of ethnopharmacology can lead to new bioactive plant compounds suitable for drug discovery and development. Macromolecular docking studies provides most detailed possible view of drug-receptor interaction where the structure of drug is designed based on its fit to three dimensional structures of receptor site rather than by analogy to other active structures or random leads. Our previous studies indicate that Allicin sand Plumbagin could be used as the potent multi drug targets against the Neuraminidase, Hemagglutinin and M2 protein channel of influenza A (H1N1) pdm09. This in-vittro study has shown that P. indica L. and A. sativum extracts can inhibit influenza A (H1N1)pdm09 virus by inhibiting viral nucleoprotein synthesis and polymerase activity.

  13. The Emergence of Organizing Structure in Conceptual Representation.

    PubMed

    Lake, Brenden M; Lawrence, Neil D; Tenenbaum, Joshua B

    2018-06-01

    Both scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form-where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist. Copyright © 2018 Cognitive Science Society, Inc.

  14. The Knowledge-Integrated Network Biomarkers Discovery for Major Adverse Cardiac Events

    PubMed Central

    Jin, Guangxu; Zhou, Xiaobo; Wang, Honghui; Zhao, Hong; Cui, Kemi; Zhang, Xiang-Sun; Chen, Luonan; Hazen, Stanley L.; Li, King; Wong, Stephen T. C.

    2010-01-01

    The mass spectrometry (MS) technology in clinical proteomics is very promising for discovery of new biomarkers for diseases management. To overcome the obstacles of data noises in MS analysis, we proposed a new approach of knowledge-integrated biomarker discovery using data from Major Adverse Cardiac Events (MACE) patients. We first built up a cardiovascular-related network based on protein information coming from protein annotations in Uniprot, protein–protein interaction (PPI), and signal transduction database. Distinct from the previous machine learning methods in MS data processing, we then used statistical methods to discover biomarkers in cardiovascular-related network. Through the tradeoff between known protein information and data noises in mass spectrometry data, we finally could firmly identify those high-confident biomarkers. Most importantly, aided by protein–protein interaction network, that is, cardiovascular-related network, we proposed a new type of biomarkers, that is, network biomarkers, composed of a set of proteins and the interactions among them. The candidate network biomarkers can classify the two groups of patients more accurately than current single ones without consideration of biological molecular interaction. PMID:18665624

  15. Semi-automated surface mapping via unsupervised classification

    NASA Astrophysics Data System (ADS)

    D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.

    2017-09-01

    Due to the increasing volume of the returned data from space mission, the human search for correlation and identification of interesting features becomes more and more unfeasible. Statistical extraction of features via machine learning methods will increase the scientific output of remote sensing missions and aid the discovery of yet unknown feature hidden in dataset. Those methods exploit algorithm trained on features from multiple instrument, returning classification maps that explore intra-dataset correlation, allowing for the discovery of unknown features. We present two applications, one for Mercury and one for Vesta.

  16. Discovery, synthesis, and pharmacological evaluation of spiropiperidine hydroxamic acid based derivatives as structurally novel histone deacetylase (HDAC) inhibitors.

    PubMed

    Varasi, Mario; Thaler, Florian; Abate, Agnese; Bigogno, Chiara; Boggio, Roberto; Carenzi, Giacomo; Cataudella, Tiziana; Dal Zuffo, Roberto; Fulco, Maria Carmela; Rozio, Marco Giulio; Mai, Antonello; Dondio, Giulio; Minucci, Saverio; Mercurio, Ciro

    2011-04-28

    New spiro[chromane-2,4'-piperidine] and spiro[benzofuran-2,4'-piperidine] hydroxamic acid derivatives as HDAC inhibitors have been identified by combining privileged structures with a hydroxamic acid moiety as zinc binding group. The compounds were evaluated for their ability to inhibit nuclear extract HDACs and for their in vitro antiproliferative activity on different tumor cell lines. This work resulted in the discovery of spirocycle 30d that shows good oral bioavailability and tumor growth inhibition in an HCT-116 murine xenograft model.

  17. Lynx: a database and knowledge extraction engine for integrative medicine.

    PubMed

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.

  18. Big data to smart data in Alzheimer's disease: The brain health modeling initiative to foster actionable knowledge.

    PubMed

    Geerts, Hugo; Dacks, Penny A; Devanarayan, Viswanath; Haas, Magali; Khachaturian, Zaven S; Gordon, Mark Forrest; Maudsley, Stuart; Romero, Klaus; Stephenson, Diane

    2016-09-01

    Massive investment and technological advances in the collection of extensive and longitudinal information on thousands of Alzheimer patients results in large amounts of data. These "big-data" databases can potentially advance CNS research and drug development. However, although necessary, they are not sufficient, and we posit that they must be matched with analytical methods that go beyond retrospective data-driven associations with various clinical phenotypes. Although these empirically derived associations can generate novel and useful hypotheses, they need to be organically integrated in a quantitative understanding of the pathology that can be actionable for drug discovery and development. We argue that mechanism-based modeling and simulation approaches, where existing domain knowledge is formally integrated using complexity science and quantitative systems pharmacology can be combined with data-driven analytics to generate predictive actionable knowledge for drug discovery programs, target validation, and optimization of clinical development. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  19. A knowledge discovery object model API for Java

    PubMed Central

    Zuyderduyn, Scott D; Jones, Steven JM

    2003-01-01

    Background Biological data resources have become heterogeneous and derive from multiple sources. This introduces challenges in the management and utilization of this data in software development. Although efforts are underway to create a standard format for the transmission and storage of biological data, this objective has yet to be fully realized. Results This work describes an application programming interface (API) that provides a framework for developing an effective biological knowledge ontology for Java-based software projects. The API provides a robust framework for the data acquisition and management needs of an ontology implementation. In addition, the API contains classes to assist in creating GUIs to represent this data visually. Conclusions The Knowledge Discovery Object Model (KDOM) API is particularly useful for medium to large applications, or for a number of smaller software projects with common characteristics or objectives. KDOM can be coupled effectively with other biologically relevant APIs and classes. Source code, libraries, documentation and examples are available at . PMID:14583100

  20. Emergence of Chinese drug discovery research: impact of hit and lead identification.

    PubMed

    Zhou, Caihong; Zhou, Yan; Wang, Jia; Zhu, Yue; Deng, Jiejie; Wang, Ming-Wei

    2015-03-01

    The identification of hits and the generation of viable leads is an early and yet crucial step in drug discovery. In the West, the main players of drug discovery are pharmaceutical and biotechnology companies, while in China, academic institutions remain central in the field of drug discovery. There has been a tremendous amount of investment from the public as well as private sectors to support infrastructure buildup and expertise consolidation relative to drug discovery and development in the past two decades. A large-scale compound library has been established in China, and a series of high-impact discoveries of lead compounds have been made by integrating information obtained from different technology-based strategies. Natural products are a major source in China's drug discovery efforts. Knowledge has been enhanced via disruptive breakthroughs such as the discovery of Boc5 as a nonpeptidic agonist of glucagon-like peptide 1 receptor (GLP-1R), one of the class B G protein-coupled receptors (GPCRs). Most of the original hit identification and lead generation were carried out by academic institutions, including universities and specialized research institutes. The Chinese pharmaceutical industry is gradually transforming itself from manufacturing low-end generics and active pharmaceutical ingredients to inventing new drugs. © 2014 Society for Laboratory Automation and Screening.

  1. Mathematical modeling for novel cancer drug discovery and development.

    PubMed

    Zhang, Ping; Brusic, Vladimir

    2014-10-01

    Mathematical modeling enables: the in silico classification of cancers, the prediction of disease outcomes, optimization of therapy, identification of promising drug targets and prediction of resistance to anticancer drugs. In silico pre-screened drug targets can be validated by a small number of carefully selected experiments. This review discusses the basics of mathematical modeling in cancer drug discovery and development. The topics include in silico discovery of novel molecular drug targets, optimization of immunotherapies, personalized medicine and guiding preclinical and clinical trials. Breast cancer has been used to demonstrate the applications of mathematical modeling in cancer diagnostics, the identification of high-risk population, cancer screening strategies, prediction of tumor growth and guiding cancer treatment. Mathematical models are the key components of the toolkit used in the fight against cancer. The combinatorial complexity of new drugs discovery is enormous, making systematic drug discovery, by experimentation, alone difficult if not impossible. The biggest challenges include seamless integration of growing data, information and knowledge, and making them available for a multiplicity of analyses. Mathematical models are essential for bringing cancer drug discovery into the era of Omics, Big Data and personalized medicine.

  2. Lifeomics leads the age of grand discoveries.

    PubMed

    He, Fuchu

    2013-03-01

    When our knowledge of a field accumulates to a certain level, we are bound to see the rise of one or more great scientists. They will make a series of grand discoveries/breakthroughs and push the discipline into an 'age of grand discoveries'. Mathematics, geography, physics and chemistry have all experienced their ages of grand discoveries; and in life sciences, the age of grand discoveries has appeared countless times since the 16th century. Thanks to the ever-changing development of molecular biology over the past 50 years, contemporary life science is once again approaching its breaking point and the trigger for this is most likely to be 'lifeomics'. At the end of the 20th century, genomics wrote out the 'script of life'; proteomics decoded the script; and RNAomics, glycomics and metabolomics came into bloom. These 'omics', with their unique epistemology and methodology, quickly became the thrust of life sciences, pushing the discipline to new high. Lifeomics, which encompasses all omics, has taken shape and is now signalling the dawn of a new era, the age of grand discoveries.

  3. Recent advances in inkjet dispensing technologies: applications in drug discovery.

    PubMed

    Zhu, Xiangcheng; Zheng, Qiang; Yang, Hu; Cai, Jin; Huang, Lei; Duan, Yanwen; Xu, Zhinan; Cen, Peilin

    2012-09-01

    Inkjet dispensing technology is a promising fabrication methodology widely applied in drug discovery. The automated programmable characteristics and high-throughput efficiency makes this approach potentially very useful in miniaturizing the design patterns for assays and drug screening. Various custom-made inkjet dispensing systems as well as specialized bio-ink and substrates have been developed and applied to fulfill the increasing demands of basic drug discovery studies. The incorporation of other modern technologies has further exploited the potential of inkjet dispensing technology in drug discovery and development. This paper reviews and discusses the recent developments and practical applications of inkjet dispensing technology in several areas of drug discovery and development including fundamental assays of cells and proteins, microarrays, biosensors, tissue engineering, basic biological and pharmaceutical studies. Progression in a number of areas of research including biomaterials, inkjet mechanical systems and modern analytical techniques as well as the exploration and accumulation of profound biological knowledge has enabled different inkjet dispensing technologies to be developed and adapted for high-throughput pattern fabrication and miniaturization. This in turn presents a great opportunity to propel inkjet dispensing technology into drug discovery.

  4. Integration of Microfractionation, qNMR and Zebrafish Screening for the In Vivo Bioassay-Guided Isolation and Quantitative Bioactivity Analysis of Natural Products

    PubMed Central

    Maes, Jan; Siverio-Mota, Dany; Marcourt, Laurence; Munck, Sebastian; Kamuhabwa, Appolinary R.; Moshi, Mainen J.; Esguerra, Camila V.; de Witte, Peter A. M.; Crawford, Alexander D.; Wolfender, Jean-Luc

    2013-01-01

    Natural products (NPs) are an attractive source of chemical diversity for small-molecule drug discovery. Several challenges nevertheless persist with respect to NP discovery, including the time and effort required for bioassay-guided isolation of bioactive NPs, and the limited biomedical relevance to date of in vitro bioassays used in this context. With regard to bioassays, zebrafish have recently emerged as an effective model system for chemical biology, allowing in vivo high-content screens that are compatible with microgram amounts of compound. For the deconvolution of the complex extracts into their individual constituents, recent progress has been achieved on several fronts as analytical techniques now enable the rapid microfractionation of extracts, and microflow NMR methods have developed to the point of allowing the identification of microgram amounts of NPs. Here we combine advanced analytical methods with high-content screening in zebrafish to create an integrated platform for microgram-scale, in vivo NP discovery. We use this platform for the bioassay-guided fractionation of an East African medicinal plant, Rhynchosia viscosa, resulting in the identification of both known and novel isoflavone derivatives with anti-angiogenic and anti-inflammatory activity. Quantitative microflow NMR is used both to determine the structure of bioactive compounds and to quantify them for direct dose-response experiments at the microgram scale. The key advantages of this approach are (1) the microgram scale at which both biological and analytical experiments can be performed, (2) the speed and the rationality of the bioassay-guided fractionation – generic for NP extracts of diverse origin – that requires only limited sample-specific optimization and (3) the use of microflow NMR for quantification, enabling the identification and dose-response experiments with only tens of micrograms of each compound. This study demonstrates that a complete in vivo bioassay-guided fractionation can be performed with only 20 mg of NP extract within a few days. PMID:23700445

  5. Method and system for knowledge discovery using non-linear statistical analysis and a 1st and 2nd tier computer program

    DOEpatents

    Hively, Lee M [Philadelphia, TN

    2011-07-12

    The invention relates to a method and apparatus for simultaneously processing different sources of test data into informational data and then processing different categories of informational data into knowledge-based data. The knowledge-based data can then be communicated between nodes in a system of multiple computers according to rules for a type of complex, hierarchical computer system modeled on a human brain.

  6. From Residency to Lifelong Learning.

    PubMed

    Brandt, Keith

    2015-11-01

    The residency training experience is the perfect environment for learning. The university/institution patient population provides a never-ending supply of patients with unique management challenges. Resources abound that allow the discovery of knowledge about similar situations. Senior teachers provide counseling and help direct appropriate care. Periodic testing and evaluations identify deficiencies, which can be corrected with future study. What happens, however, when the resident graduates? Do they possess all the knowledge they'll need for the rest of their career? Will medical discovery stand still limiting the need for future study? If initial certification establishes that the physician has the skills and knowledge to function as an independent physician and surgeon, how do we assure the public that plastic surgeons will practice lifelong learning and remain safe throughout their career? Enter Maintenance of Certification (MOC). In an ideal world, MOC would provide many of the same tools as residency training: identification of gaps in knowledge, resources to correct those deficiencies, overall assessment of knowledge, feedback about communication skills and professionalism, and methods to evaluate and improve one's practice. This article discusses the need; for education and self-assessment that extends beyond residency training and a commitment to lifelong learning. The American Board of Plastic Surgery MOC program is described to demonstrate how it helps the diplomate reach the goal of continuous practice improvement.

  7. Semantically-enabled Knowledge Discovery in the Deep Carbon Observatory

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, Y.; Ma, X.; Erickson, J. S.; West, P.; Fox, P. A.

    2013-12-01

    The Deep Carbon Observatory (DCO) is a decadal effort aimed at transforming scientific and public understanding of carbon in the complex deep earth system from the perspectives of Deep Energy, Deep Life, Extreme Physics and Chemistry, and Reservoirs and Fluxes. Over the course of the decade DCO scientific activities will generate a massive volume of data across a variety of disciplines, presenting significant challenges in terms of data integration, management, analysis and visualization, and ultimately limiting the ability of scientists across disciplines to make insights and unlock new knowledge. The DCO Data Science Team (DCO-DS) is applying Semantic Web methodologies to construct a knowledge representation focused on the DCO Earth science disciplines, and use it together with other technologies (e.g. natural language processing and data mining) to create a more expressive representation of the distributed corpus of DCO artifacts including datasets, metadata, instruments, sensors, platforms, deployments, researchers, organizations, funding agencies, grants and various awards. The embodiment of this knowledge representation is the DCO Data Science Infrastructure, in which unique entities within the DCO domain and the relations between them are recognized and explicitly identified. The DCO-DS Infrastructure will serve as a platform for more efficient and reliable searching, discovery, access, and publication of information and knowledge for the DCO scientific community and beyond.

  8. Scaffold Repurposing of Old Drugs Towards New Cancer Drug Discovery.

    PubMed

    Chen, Haijun; Wu, Jianlei; Gao, Yu; Chen, Haiying; Zhou, Jia

    2016-01-01

    As commented by the Nobelist James Black that "The most fruitful basis of the discovery of a new drug is to start with an old drug", drug repurposing represents an attractive drug discovery strategy. Despite the success of several repurposed drugs on the market, the ultimate therapeutic potential of a large number of non-cancer drugs is hindered during their repositioning due to various issues including the limited efficacy and intellectual property. With the increasing knowledge about the pharmacological properties and newly identified targets, the scaffolds of the old drugs emerge as a great treasure-trove towards new cancer drug discovery. In this review, we summarize the recent advances in the development of novel small molecules for cancer therapy by scaffold repurposing with highlighted examples. The relevant strategies, advantages, challenges and future research directions associated with this approach are also discussed.

  9. Why Quantify Uncertainty in Ecosystem Studies: Obligation versus Discovery Tool?

    NASA Astrophysics Data System (ADS)

    Harmon, M. E.

    2016-12-01

    There are multiple motivations for quantifying uncertainty in ecosystem studies. One is as an obligation; the other is as a tool useful in moving ecosystem science toward discovery. While reporting uncertainty should become a routine expectation, a more convincing motivation involves discovery. By clarifying what is known and to what degree it is known, uncertainty analyses can point the way toward improvements in measurements, sampling designs, and models. While some of these improvements (e.g., better sampling designs) may lead to incremental gains, those involving models (particularly model selection) may require large gains in knowledge. To be fully harnessed as a discovery tool, attitudes toward uncertainty may have to change: rather than viewing uncertainty as a negative assessment of what was done, it should be viewed as positive, helpful assessment of what remains to be done.

  10. Identification of PPARgamma Partial Agonists of Natural Origin (II): In Silico Prediction in Natural Extracts with Known Antidiabetic Activity

    PubMed Central

    Guasch, Laura; Sala, Esther; Mulero, Miquel; Valls, Cristina; Salvadó, Maria Josepa; Pujadas, Gerard; Garcia-Vallvé, Santiago

    2013-01-01

    Background Natural extracts have played an important role in the prevention and treatment of diseases and are important sources for drug discovery. However, to be effectively used in these processes, natural extracts must be characterized through the identification of their active compounds and their modes of action. Methodology/Principal Findings From an initial set of 29,779 natural products that are annotated with their natural source and using a previously developed virtual screening procedure (carefully validated experimentally), we have predicted as potential peroxisome proliferators-activated receptor gamma (PPARγ) partial agonists 12 molecules from 11 extracts known to have antidiabetic activity. Six of these molecules are similar to molecules with described antidiabetic activity but whose mechanism of action is unknown. Therefore, it is plausible that these 12 molecules could be the bioactive molecules responsible, at least in part, for the antidiabetic activity of the extracts containing them. In addition, we have also identified as potential PPARγ partial agonists 10 molecules from 16 plants with undescribed antidiabetic activity but that are related (i.e., they are from the same genus) to plants with known antidiabetic properties. None of the 22 molecules that we predict as PPARγ partial agonists show chemical similarity with a group of 211 known PPARγ partial agonists obtained from the literature. Conclusions/Significance Our results provide a new hypothesis about the active molecules of natural extracts with antidiabetic properties and their mode of action. We also suggest plants with undescribed antidiabetic activity that may contain PPARγ partial agonists. These plants represent a new source of potential antidiabetic extracts. Consequently, our work opens the door to the discovery of new antidiabetic extracts and molecules that can be of use, for instance, in the design of new antidiabetic drugs or functional foods focused towards the prevention/treatment of type 2 Diabetes Mellitus. PMID:23405231

  11. Metrics and the effective computational scientist: process, quality and communication.

    PubMed

    Baldwin, Eric T

    2012-09-01

    Recent treatments of computational knowledge worker productivity have focused upon the value the discipline brings to drug discovery using positive anecdotes. While this big picture approach provides important validation of the contributions of these knowledge workers, the impact accounts do not provide the granular detail that can help individuals and teams perform better. I suggest balancing the impact-focus with quantitative measures that can inform the development of scientists. Measuring the quality of work, analyzing and improving processes, and the critical evaluation of communication can provide immediate performance feedback. The introduction of quantitative measures can complement the longer term reporting of impacts on drug discovery. These metric data can document effectiveness trends and can provide a stronger foundation for the impact dialogue. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Mapping the Landscape, Journeying Together: The Gold Foundation's Model for Research-Based Advocacy in Humanism in Medicine.

    PubMed

    Gaufberg, Elizabeth

    2017-12-01

    Mapping the Landscape, Journeying Together (MTL) is an initiative of the Arnold P. Gold Foundation Research Institute. The MTL initiative awards teams with a grant to complete a rigorous review of the literature on a topic related to humanism in health care. Teams may then seek a discovery or advocacy grant to fill in gaps in knowledge or to make or advocate for change. In this Commentary, the author reveals the MTL journey through the metaphor of cartography. She describes the initial development of a road map, as well as the MTL community's experience of navigation, discovery, and exploration. MTL participants are not only incrementally adding to a complex body of knowledge but also actively cultivating a robust community of practice.

  13. Microbial Dark Matter Investigations: How Microbial Studies Transform Biological Knowledge and Empirically Sketch a Logic of Scientific Discovery

    PubMed Central

    Bernard, Guillaume; Pathmanathan, Jananan S; Lannes, Romain; Lopez, Philippe; Bapteste, Eric

    2018-01-01

    Abstract Microbes are the oldest and most widespread, phylogenetically and metabolically diverse life forms on Earth. However, they have been discovered only 334 years ago, and their diversity started to become seriously investigated even later. For these reasons, microbial studies that unveil novel microbial lineages and processes affecting or involving microbes deeply (and repeatedly) transform knowledge in biology. Considering the quantitative prevalence of taxonomically and functionally unassigned sequences in environmental genomics data sets, and that of uncultured microbes on the planet, we propose that unraveling the microbial dark matter should be identified as a central priority for biologists. Based on former empirical findings of microbial studies, we sketch a logic of discovery with the potential to further highlight the microbial unknowns. PMID:29420719

  14. Discovery and Development of Therapeutic Drugs Against Lethal Human RNA-Viruses: A Multidisciplinary Assault

    DTIC Science & Technology

    1989-08-21

    extract of Balanites aegyptiaca Del afforded four new cytostatic saponins named balanitin-4 (I), -5 (I), -6 (11) and -7 (IV). On the basis of enzymatic...the investigator(s) adherel to policies of applicable Federal Law 45 CFR 46. In concucting research utilizing recombinant DNA technology , the...of novel antiviral substances from confirmed active extracts of marine organisms and both higher and lower plants. Maximum effort would be devoted to

  15. A High-Throughput Screening Platform of Microbial Natural Products for the Discovery of Molecules with Antibiofilm Properties against Salmonella

    PubMed Central

    Paytubi, Sonia; de La Cruz, Mercedes; Tormo, Jose R.; Martín, Jesús; González, Ignacio; González-Menendez, Victor; Genilloud, Olga; Reyes, Fernando; Vicente, Francisca; Madrid, Cristina; Balsalobre, Carlos

    2017-01-01

    In this report, we describe a High-Throughput Screening (HTS) to identify compounds that inhibit biofilm formation or cause the disintegration of an already formed biofilm using the Salmonella Enteritidis 3934 strain. Initially, we developed a new methodology for growing Salmonella biofilms suitable for HTS platforms. The biomass associated with biofilm at the solid-liquid interface was quantified by staining both with resazurin and crystal violet, to detect living cells and total biofilm mass, respectively. For a pilot project, a subset of 1120 extracts from the Fundación MEDINA's collection was examined to identify molecules with antibiofilm activity. This is the first validated HTS assay of microbial natural product extracts which allows for the detection of four types of activities which are not mutually exclusive: inhibition of biofilm formation, detachment of the preformed biofilm and antimicrobial activity against planktonic cells or biofilm embedded cells. Currently, several extracts have been selected for further fractionation and purification of the active compounds. In one of the natural extracts patulin has been identified as a potent molecule with antimicrobial activity against both, planktonic cells and cells within the biofilm. These findings provide a proof of concept that the developed HTS can lead to the discovery of new natural compounds with antibiofilm activity against Salmonella and its possible use as an alternative to antimicrobial therapies and traditional disinfectants. PMID:28303128

  16. Automatic differential analysis of NMR experiments in complex samples.

    PubMed

    Margueritte, Laure; Markov, Petar; Chiron, Lionel; Starck, Jean-Philippe; Vonthron-Sénécheau, Catherine; Bourjot, Mélanie; Delsuc, Marc-André

    2018-06-01

    Liquid state nuclear magnetic resonance (NMR) is a powerful tool for the analysis of complex mixtures of unknown molecules. This capacity has been used in many analytical approaches: metabolomics, identification of active compounds in natural extracts, and characterization of species, and such studies require the acquisition of many diverse NMR measurements on series of samples. Although acquisition can easily be performed automatically, the number of NMR experiments involved in these studies increases very rapidly, and this data avalanche requires to resort to automatic processing and analysis. We present here a program that allows the autonomous, unsupervised processing of a large corpus of 1D, 2D, and diffusion-ordered spectroscopy experiments from a series of samples acquired in different conditions. The program provides all the signal processing steps, as well as peak-picking and bucketing of 1D and 2D spectra, the program and its components are fully available. In an experiment mimicking the search of a bioactive species in a natural extract, we use it for the automatic detection of small amounts of artemisinin added to a series of plant extracts and for the generation of the spectral fingerprint of this molecule. This program called Plasmodesma is a novel tool that should be useful to decipher complex mixtures, particularly in the discovery of biologically active natural products from plants extracts but can also in drug discovery or metabolomics studies. Copyright © 2017 John Wiley & Sons, Ltd.

  17. e-IQ and IQ knowledge mining for generalized LDA

    NASA Astrophysics Data System (ADS)

    Jenkins, Jeffrey; van Bergem, Rutger; Sweet, Charles; Vietsch, Eveline; Szu, Harold

    2015-05-01

    How can the human brain uncover patterns, associations and features in real-time, real-world data? There must be a general strategy used to transform raw signals into useful features, but representing this generalization in the context of our information extraction tool set is lacking. In contrast to Big Data (BD), Large Data Analysis (LDA) has become a reachable multi-disciplinary goal in recent years due in part to high performance computers and algorithm development, as well as the availability of large data sets. However, the experience of Machine Learning (ML) and information communities has not been generalized into an intuitive framework that is useful to researchers across disciplines. The data exploration phase of data mining is a prime example of this unspoken, ad-hoc nature of ML - the Computer Scientist works with a Subject Matter Expert (SME) to understand the data, and then build tools (i.e. classifiers, etc.) which can benefit the SME and the rest of the researchers in that field. We ask, why is there not a tool to represent information in a meaningful way to the researcher asking the question? Meaning is subjective and contextual across disciplines, so to ensure robustness, we draw examples from several disciplines and propose a generalized LDA framework for independent data understanding of heterogeneous sources which contribute to Knowledge Discovery in Databases (KDD). Then, we explore the concept of adaptive Information resolution through a 6W unsupervised learning methodology feedback system. In this paper, we will describe the general process of man-machine interaction in terms of an asymmetric directed graph theory (digging for embedded knowledge), and model the inverse machine-man feedback (digging for tacit knowledge) as an ANN unsupervised learning methodology. Finally, we propose a collective learning framework which utilizes a 6W semantic topology to organize heterogeneous knowledge and diffuse information to entities within a society in a personalized way.

  18. Artisanal Extraction and Traditional Knowledge Associated with Medicinal Use of Crabwood Oil (Carapa guianensis Aublet.) in a Peri-Urban Várzea Environment in the Amazon Estuary.

    PubMed

    Nardi, Mariane; Lira-Guedes, Ana Cláudia; Albuquerque Cunha, Helenilza Ferreira; Guedes, Marcelino Carneiro; Mustin, Karen; Gomes, Suellen Cristina Pantoja

    2016-01-01

    Várzea forests of the Amazon estuary contain species of importance to riverine communities. For example, the oil extracted from the seeds of crabwood trees is traditionally used to combat various illnesses and as such artisanal extraction processes have been maintained. The objectives of this study were to (1) describe the process involved in artisanal extraction of crabwood oil in the Fazendinha Protected Area, in the state of Amapá; (2) characterise the processes of knowledge transfer associated with the extraction and use of crabwood oil within a peri-urban riverine community; and (3) discern medicinal uses of the oil. The data were obtained using semistructured interviews with 13 community members involved in crabwood oil extraction and via direct observation. The process of oil extraction is divided into four stages: seed collection; cooking and resting of the seeds; shelling of the seeds and dough preparation; and oil collection. Oil extraction is carried out within the home for personal use, with surplus marketed within the community. More than 90% of the members of the community involved in extraction of crabwood oil highlighted the use of the oil to combat inflammation of the throat. Knowledge transfer occurs via oral transmission and through direct observation.

  19. Artisanal Extraction and Traditional Knowledge Associated with Medicinal Use of Crabwood Oil (Carapa guianensis Aublet.) in a Peri-Urban Várzea Environment in the Amazon Estuary

    PubMed Central

    Lira-Guedes, Ana Cláudia; Albuquerque Cunha, Helenilza Ferreira; Guedes, Marcelino Carneiro; Mustin, Karen; Gomes, Suellen Cristina Pantoja

    2016-01-01

    Várzea forests of the Amazon estuary contain species of importance to riverine communities. For example, the oil extracted from the seeds of crabwood trees is traditionally used to combat various illnesses and as such artisanal extraction processes have been maintained. The objectives of this study were to (1) describe the process involved in artisanal extraction of crabwood oil in the Fazendinha Protected Area, in the state of Amapá; (2) characterise the processes of knowledge transfer associated with the extraction and use of crabwood oil within a peri-urban riverine community; and (3) discern medicinal uses of the oil. The data were obtained using semistructured interviews with 13 community members involved in crabwood oil extraction and via direct observation. The process of oil extraction is divided into four stages: seed collection; cooking and resting of the seeds; shelling of the seeds and dough preparation; and oil collection. Oil extraction is carried out within the home for personal use, with surplus marketed within the community. More than 90% of the members of the community involved in extraction of crabwood oil highlighted the use of the oil to combat inflammation of the throat. Knowledge transfer occurs via oral transmission and through direct observation. PMID:27478479

  20. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  1. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  2. Antifungal cyclic peptides from the marine sponge Microscleroderma herdmani

    USDA-ARS?s Scientific Manuscript database

    Screening natural product extracts from National Cancer Institute Open Repository for antifungal discovery afforded hits for bioassay-guided fractionation. Upon LC-MS analysis of column fractions with antifungal activities to generate information on chemical structure, two new cyclic hexapeptides, m...

  3. An Information Extraction Framework for Cohort Identification Using Electronic Health Records

    PubMed Central

    Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255

  4. Targeted isolation and identification of bioactive compounds lowering cholesterol in the crude extracts of crabapples using UPLC-DAD-MS-SPE/NMR based on pharmacology-guided PLS-DA.

    PubMed

    Wen, Chao; Wang, Dongshan; Li, Xing; Huang, Tao; Huang, Cheng; Hu, Kaifeng

    2018-02-20

    The anti-hyperlipidemic effects of crude crabapple extracts derived from Malus 'Red jade', Malus hupehensis (Pamp.) Rehd. and Malus prunifolia (Willd.) Borkh. were evaluated on high-fat diet induced obese (HF DIO) mice. The results revealed that some of these extracts could lower serum cholesterol levels in HF DIO mice. The same extracts were also parallelly analyzed by LC-MS in both positive and negative ionization modes. Based on the pharmacological results, 22 LC-MS variables were identified to be correlated with the anti-hyperlipidemic effects using partial least square discriminant analysis (PLS-DA) and independent samples t-test. Further, under the guidance of the bioactivity-correlated LC-MS signals, 10 compounds were targetedly isolated and enriched using UPLC-DAD-MS-SPE and identified/elucidated by NMR together with MS/MS as citric acid(1), p-coumaric acid(2), hyperoside(3), myricetin(4), naringenin(5), quercetin(6), kaempferol(7), gentiopicroside(8), ursolic acid(9) and 8-epiloganic acid(10). Among these 10 compounds, 6 compounds, hyperoside(3), myricetin(4), naringenin(5), quercetin(6), kaempferol(7) and ursolic acid(9), were individually studied and reported to indeed have effects on lowering the serum lipid levels. These results demonstrated the efficiency of this strategy for drug discovery. In contrast to traditional routes to discover bioactive compounds in the plant extracts, targeted isolation and identification of bioactive compounds in the crude plant extracts using UPLC-DAD-MS-SPE/NMR based on pharmacology-guided PLS-DA of LC-MS data brings forward a new efficient dereplicated approach to natural products research for drug discovery. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Clinical Pharmacology & Therapeutics: Past, Present and Future

    PubMed Central

    Waldman, SA; Terzic, A

    2016-01-01

    Clinical Pharmacology & Therapeutics (CPT), the definitive and timely source for advances in human therapeutics, transcends the drug discovery, development, regulation and utilization continuum to catalyze, evolve and disseminate discipline-transformative knowledge. Prioritized themes and multidisciplinary content drive the science and practice of clinical pharmacology, offering a trusted point of reference. An authoritative herald across global communities, CPT is a timeless information vehicle at the vanguard of discovery, translation and application ushering therapeutic innovation into modern health care. PMID:28194770

  6. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE PAGES

    Oyen, Diane; Anderson, Blake; Sentz, Kari; ...

    2017-09-15

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  7. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oyen, Diane; Anderson, Blake; Sentz, Kari

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  8. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Identification of research hypotheses and new knowledge from scientific literature.

    PubMed

    Shardlow, Matthew; Batista-Navarro, Riza; Thompson, Paul; Nawaz, Raheel; McNaught, John; Ananiadou, Sophia

    2018-06-25

    Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions. We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

  10. The History of the Discovery of Blood Circulation: Unrecognized Contributions of Ayurveda Masters

    ERIC Educational Resources Information Center

    Patwardhan, Kishor

    2012-01-01

    Ayurveda, the native healthcare system of India, is a rich resource of well-documented ancient medical knowledge. Although the roots of this knowledge date back to the Vedic and post-Vedic eras, it is generally believed that a dedicated branch for healthcare was gradually established approximately between 400 BCE and 200 CE. Probably because the…

  11. Creating a Ten-Year Science and Innovation Framework for the UK: A Perspective Based on US Experience

    ERIC Educational Resources Information Center

    Crawley, Edward F.; Greenwald, Suzanne B.

    2006-01-01

    The sustainability of a competitive, national economy depends largely on the ability of companies to deliver innovative knowledge-intensive goods and services to the market. These are the ultimate outputs of a scientific knowledge system. Ideas flow from the critical, identifiable phases of (a) the discovery, (b) the development, (c) the…

  12. Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records

    PubMed Central

    Ikeda, Mitsuru

    2017-01-01

    Information extraction and knowledge discovery regarding adverse drug reaction (ADR) from large-scale clinical texts are very useful and needy processes. Two major difficulties of this task are the lack of domain experts for labeling examples and intractable processing of unstructured clinical texts. Even though most previous works have been conducted on these issues by applying semisupervised learning for the former and a word-based approach for the latter, they face with complexity in an acquisition of initial labeled data and ignorance of structured sequence of natural language. In this study, we propose automatic data labeling by distant supervision where knowledge bases are exploited to assign an entity-level relation label for each drug-event pair in texts, and then, we use patterns for characterizing ADR relation. The multiple-instance learning with expectation-maximization method is employed to estimate model parameters. The method applies transductive learning to iteratively reassign a probability of unknown drug-event pair at the training time. By investigating experiments with 50,998 discharge summaries, we evaluate our method by varying large number of parameters, that is, pattern types, pattern-weighting models, and initial and iterative weightings of relations for unlabeled data. Based on evaluations, our proposed method outperforms the word-based feature for NB-EM (iEM), MILR, and TSVM with F1 score of 11.3%, 9.3%, and 6.5% improvement, respectively. PMID:29090077

  13. Lynx: a database and knowledge extraction engine for integrative medicine

    PubMed Central

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T. Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788

  14. Anti-plasmodial activity of Norcaesalpin D and extracts of four medicinal plants used traditionally for treatment of malaria.

    PubMed

    Nondo, Ramadhani Selemani Omari; Moshi, Mainen Julius; Erasto, Paul; Masimba, Pax Jessey; Machumi, Francis; Kidukuli, Abdul Waziri; Heydenreich, Matthias; Zofou, Denis

    2017-03-24

    Malaria is an old life-threatening parasitic disease that is still affecting many people, mainly children living in sub-Saharan Africa. Availability of effective antimalarial drugs played a significant role in the treatment and control of malaria. However, recent information on the emergence of P. falciparum parasites resistant to one of the artemisinin-based combination therapies suggests the need for discovery of new drug molecules. Therefore, this study aimed to evaluate the antiplasmodial activity of extracts, fractions and isolated compound from medicinal plants traditionally used in the treatment of malaria in Tanzania. Dry powdered plant materials were extracted by cold macerations using different solvents. Norcaesalpin D was isolated by column chromatography from dichloromethane root extract of Caesalpinia bonducella and its structure was assigned based on the spectral data. Crude extracts, fractions and isolated compound were evaluated for antiplasmodial activity against chloroquine-sensitive P. falciparum (3D7), chloroquine-resistant P. falciparum (Dd2, K1) and artemisinin-resistant P. falciparum (IPC 5202 Battambang, IPC 4912 Mondolkiri) strains using the parasite lactate dehydrogenase assay. The results indicated that extracts of Erythrina schliebenii, Holarrhena pubescens, Dissotis melleri and C. bonducella exhibited antiplasmodial activity against Dd2 parasites. Ethanolic root extract of E. schliebenii had an IC 50 of 1.87 μg/mL while methanolic and ethanolic root extracts of H. pubescens exhibited an IC 50  = 2.05 μg/mL and IC 50  = 2.43 μg/mL, respectively. Fractions from H. pubescens and C. bonducella roots were found to be highly active against K1, Dd2 and artemisinin-resistant parasites. Norcaesalpin D from C. bonducella root extract was active with IC 50 of 0.98, 1.85 and 2.13 μg/mL against 3D7, Dd2 and IPC 4912-Mondolkiri parasites, respectively. Antiplasmodial activity of norcaesalpin D and extracts of E. schliebenii, H. pubescens, D. melleri and C. bonducella reported in this study requires further attention for the discovery of antimalarial lead compounds for future drug development.

  15. [The discovery of blood circulation: revolution or revision?].

    PubMed

    Crignon, Claire

    2011-01-01

    The discovery of the principle of blood circulation by William Harvey is generally considered as one of the major events of the "scientific revolution" of the 17th century. This paper reconsiders the question by taking in account the way Harvey's discovery was discussed by some contemporary philosophers and physicians, in particular Fontenelle, who insisted on the necessity of redefining methods and principles of medical knowledge, basing themselves on the revival of anatomy and physiology, and of its consequences on the way it permits to think about the human nature. This return allows us to consider the opportunity of substituting the kuhnian scheme of "structure of scientific revolutions" for the bachelardian concept of "refonte".

  16. Antibacterial Drug Discovery: Some Assembly Required.

    PubMed

    Tommasi, Rubén; Iyer, Ramkumar; Miller, Alita A

    2018-05-11

    Our limited understanding of the molecular basis for compound entry into and efflux out of Gram-negative bacteria is now recognized as a key bottleneck for the rational discovery of novel antibacterial compounds. Traditional, large-scale biochemical or target-agnostic phenotypic antibacterial screening efforts have, as a result, not been very fruitful. A main driver of this knowledge gap has been the historical lack of predictive cellular assays, tools, and models that provide structure-activity relationships to inform optimization of compound accumulation. A variety of recent approaches has recently been described to address this conundrum. This Perspective explores these approaches and considers ways in which their integration could successfully redirect antibacterial drug discovery efforts.

  17. Advancement into the Arctic Region for Bioactive Sponge Secondary Metabolites

    PubMed Central

    Abbas, Samuel; Kelly, Michelle; Bowling, John; Sims, James; Waters, Amanda; Hamann, Mark

    2011-01-01

    Porifera have long been a reservoir for the discovery of bioactive compounds and drug discovery. Most research in the area has focused on sponges from tropical and temperate waters, but more recently the focus has shifted to the less accessible colder waters of the Antarctic and, to a lesser extent, the Arctic. The Antarctic region in particular has been a more popular location for natural products discovery and has provided promising candidates for drug development. This article reviews groups of bioactive compounds that have been isolated and reported from the southern reaches of the Arctic Circle, surveys the known sponge diversity present in the Arctic waters, and details a recent sponge collection by our group in the Aleutian Islands, Alaska. The collection has yielded previously undescribed sponge species along with primary activity against opportunistic infectious diseases, malaria, and HCV. The discovery of new sponge species and bioactive crude extracts gives optimism for the isolation of new bioactive compounds from a relatively unexplored source. PMID:22163194

  18. A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.

    PubMed

    Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian

    2016-01-01

    A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.

  19. Knowledge discovery in traditional Chinese medicine: state of the art and perspectives.

    PubMed

    Feng, Yi; Wu, Zhaohui; Zhou, Xuezhong; Zhou, Zhongmei; Fan, Weiyu

    2006-11-01

    As a complementary medical system to Western medicine, traditional Chinese medicine (TCM) provides a unique theoretical and practical approach to the treatment of diseases over thousands of years. Confronted with the increasing popularity of TCM and the huge volume of TCM data, historically accumulated and recently obtained, there is an urgent need to explore these resources effectively by the techniques of knowledge discovery in database (KDD). This paper aims at providing an overview of recent KDD studies in TCM field. A literature search was conducted in both English and Chinese publications, and major studies of knowledge discovery in TCM (KDTCM) reported in these materials were identified. Based on an introduction to the state of the art of TCM data resources, a review of four subfields of KDTCM research was presented, including KDD for the research of Chinese medical formula, KDD for the research of Chinese herbal medicine, KDD for TCM syndrome research, and KDD for TCM clinical diagnosis. Furthermore, the current state and main problems in each subfield were summarized based on a discussion of existing studies, and future directions for each subfield were also proposed accordingly. A series of KDD methods are used in existing KDTCM researches, ranging from conventional frequent itemset mining to state of the art latent structure model. Considerable interesting discoveries are obtained by these methods, such as novel TCM paired drugs discovered by frequent itemset analysis, functional community of related genes discovered under syndrome perspective by text mining, the high proportion of toxic plants in the botanical family Ranunculaceae disclosed by statistical analysis, the association between M-cholinoceptor blocking drug and Solanaceae revealed by association rule mining, etc. It is particularly inspiring to see some studies connecting TCM with biomedicine, which provide a novel top-down view for functional genomics research. However, further developments of KDD methods are still expected to better adapt to the features of TCM. Existing studies demonstrate that KDTCM is effective in obtaining medical discoveries. However, much more work needs to be done in order to discover real diamonds from TCM domain. The usage and development of KDTCM in the future will substantially contribute to the TCM community, as well as modern life science.

  20. 75 FR 71005 - American Education Week, 2010

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-22

    ... maintain our Nation's role as the world's engine of discovery and innovation, my Administration is.... Our Nation's schools can give students the tools, skills, and knowledge to participate fully in our...

Top