The center for causal discovery of biomedical knowledge from big data
Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-01-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. PMID:26138794
The center for causal discovery of biomedical knowledge from big data.
Cooper, Gregory F; Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard
2015-11-01
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Bioenergy Knowledge Discovery Framework Fact Sheet
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
The Bioenergy Knowledge Discovery Framework (KDF) supports the development of a sustainable bioenergy industry by providing access to a variety of data sets, publications, and collaboration and mapping tools that support bioenergy research, analysis, and decision making. In the KDF, users can search for information, contribute data, and use the tools and map interface to synthesize, analyze, and visualize information in a spatially integrated manner.
'Big Data' Collaboration: Exploring, Recording and Sharing Enterprise Knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R; Ferrell, Regina Kay
2013-01-01
As data sources and data size proliferate, knowledge discovery from "Big Data" is starting to pose several challenges. In this paper, we address a specific challenge in the practice of enterprise knowledge management while extracting actionable nuggets from diverse data sources of seemingly-related information. In particular, we address the challenge of archiving knowledge gained through collaboration, dissemination and visualization as part of the data analysis, inference and decision-making lifecycle. We motivate the implementation of an enterprise data-discovery and knowledge recorder tool, called SEEKER based on real world case-study. We demonstrate SEEKER capturing schema and data-element relationships, tracking the data elementsmore » of value based on the queries and the analytical artifacts that are being created by analysts as they use the data. We show how the tool serves as digital record of institutional domain knowledge and a documentation for the evolution of data elements, queries and schemas over time. As a knowledge management service, a tool like SEEKER saves enterprise resources and time by avoiding analytic silos, expediting the process of multi-source data integration and intelligently documenting discoveries from fellow analysts.« less
Empirical study using network of semantically related associations in bridging the knowledge gap.
Abedi, Vida; Yeasin, Mohammed; Zand, Ramin
2014-11-27
The data overload has created a new set of challenges in finding meaningful and relevant information with minimal cognitive effort. However designing robust and scalable knowledge discovery systems remains a challenge. Recent innovations in the (biological) literature mining tools have opened new avenues to understand the confluence of various diseases, genes, risk factors as well as biological processes in bridging the gaps between the massive amounts of scientific data and harvesting useful knowledge. In this paper, we highlight some of the findings using a text analytics tool, called ARIANA--Adaptive Robust and Integrative Analysis for finding Novel Associations. Empirical study using ARIANA reveals knowledge discovery instances that illustrate the efficacy of such tool. For example, ARIANA can capture the connection between the drug hexamethonium and pulmonary inflammation and fibrosis that caused the tragic death of a healthy volunteer in a 2001 John Hopkins asthma study, even though the abstract of the study was not part of the semantic model. An integrated system, such as ARIANA, could assist the human expert in exploratory literature search by bringing forward hidden associations, promoting data reuse and knowledge discovery as well as stimulating interdisciplinary projects by connecting information across the disciplines.
Text-based discovery in biomedicine: the architecture of the DAD-system.
Weeber, M; Klein, H; Aronson, A R; Mork, J G; de Jong-van den Berg, L T; Vos, R
2000-01-01
Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-system, a concept-based Natural Language Processing system for PubMed citations that provides the biomedical researcher such a tool. We describe the general architecture and illustrate its operation by a simulation of a well-known text-based discovery: The favorable effects of fish oil on patients suffering from Raynaud's disease [1].
A biological compression model and its applications.
Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd
2011-01-01
A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.
ERIC Educational Resources Information Center
Tsantis, Linda; Castellani, John
2001-01-01
This article explores how knowledge-discovery applications can empower educators with the information they need to provide anticipatory guidance for teaching and learning, forecast school and district needs, and find critical markers for making the best program decisions for children and youth with disabilities. Data mining for schools is…
Knowledge Discovery Process: Case Study of RNAV Adherence of Radar Track Data
NASA Technical Reports Server (NTRS)
Matthews, Bryan
2018-01-01
This talk is an introduction to the knowledge discovery process, beginning with: identifying the problem, choosing data sources, matching the appropriate machine learning tools, and reviewing the results. The overview will be given in the context of an ongoing study that is assessing RNAV adherence of commercial aircraft in the national airspace.
Augmented Reality-Based Simulators as Discovery Learning Tools: An Empirical Study
ERIC Educational Resources Information Center
Ibáñez, María-Blanca; Di-Serio, Ángela; Villarán-Molina, Diego; Delgado-Kloos, Carlos
2015-01-01
This paper reports empirical evidence on having students use AR-SaBEr, a simulation tool based on augmented reality (AR), to discover the basic principles of electricity through a series of experiments. AR-SaBEr was enhanced with knowledge-based support and inquiry-based scaffolding mechanisms, which proved useful for discovery learning in…
Distributed data mining on grids: services, tools, and applications.
Cannataro, Mario; Congiusta, Antonio; Pugliese, Andrea; Talia, Domenico; Trunfio, Paolo
2004-12-01
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.
Bio-TDS: bioscience query tool discovery system.
Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M
2017-01-04
Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Hassani-Pak, Keywan; Rawlings, Christopher
2017-06-13
Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Classification and assessment tools for structural motif discovery algorithms.
Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan
2013-01-01
Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María
2009-01-01
In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.
Why Quantify Uncertainty in Ecosystem Studies: Obligation versus Discovery Tool?
NASA Astrophysics Data System (ADS)
Harmon, M. E.
2016-12-01
There are multiple motivations for quantifying uncertainty in ecosystem studies. One is as an obligation; the other is as a tool useful in moving ecosystem science toward discovery. While reporting uncertainty should become a routine expectation, a more convincing motivation involves discovery. By clarifying what is known and to what degree it is known, uncertainty analyses can point the way toward improvements in measurements, sampling designs, and models. While some of these improvements (e.g., better sampling designs) may lead to incremental gains, those involving models (particularly model selection) may require large gains in knowledge. To be fully harnessed as a discovery tool, attitudes toward uncertainty may have to change: rather than viewing uncertainty as a negative assessment of what was done, it should be viewed as positive, helpful assessment of what remains to be done.
Salvador-Carulla, L; Lukersmith, S; Sullivan, W
2017-04-01
Guideline methods to develop recommendations dedicate most effort around organising discovery and corroboration knowledge following the evidence-based medicine (EBM) framework. Guidelines typically use a single dimension of information, and generally discard contextual evidence and formal expert knowledge and consumer's experiences in the process. In recognition of the limitations of guidelines in complex cases, complex interventions and systems research, there has been significant effort to develop new tools, guides, resources and structures to use alongside EBM methods of guideline development. In addition to these advances, a new framework based on the philosophy of science is required. Guidelines should be defined as implementation decision support tools for improving the decision-making process in real-world practice and not only as a procedure to optimise the knowledge base of scientific discovery and corroboration. A shift from the model of the EBM pyramid of corroboration of evidence to the use of broader multi-domain perspective graphically depicted as 'Greek temple' could be considered. This model takes into account the different stages of scientific knowledge (discovery, corroboration and implementation), the sources of knowledge relevant to guideline development (experimental, observational, contextual, expert-based and experiential); their underlying inference mechanisms (deduction, induction, abduction, means-end inferences) and a more precise definition of evidence and related terms. The applicability of this broader approach is presented for the development of the Canadian Consensus Guidelines for the Primary Care of People with Developmental Disabilities.
Computational methods in drug discovery
Leelananda, Sumudu P
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein–ligand docking, pharmacophore modeling and QSAR techniques are reviewed. PMID:28144341
Computational methods in drug discovery.
Leelananda, Sumudu P; Lindert, Steffen
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.
Providing the Missing Link: the Exposure Science Ontology ExO
Although knowledge-discovery tools are new to the exposure science community, these tools are critical for leveraging exposure information to design health studies and interpret results for improved public health decisions. Standardized ontologies define relationships, allow for ...
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called “first-person” knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer’s first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called "first-person" knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer's first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
How can knowledge discovery methods uncover spatio-temporal patterns in environmental data?
NASA Astrophysics Data System (ADS)
Wachowicz, Monica
2000-04-01
This paper proposes the integration of KDD, GVis and STDB as a long-term strategy, which will allow users to apply knowledge discovery methods for uncovering spatio-temporal patterns in environmental data. The main goal is to combine innovative techniques and associated tools for exploring very large environmental data sets in order to arrive at valid, novel, potentially useful, and ultimately understandable spatio-temporal patterns. The GeoInsight approach is described using the principles and key developments in the research domains of KDD, GVis, and STDB. The GeoInsight approach aims at the integration of these research domains in order to provide tools for performing information retrieval, exploration, analysis, and visualization. The result is a knowledge-based design, which involves visual thinking (perceptual-cognitive process) and automated information processing (computer-analytical process).
An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
A collaborative filtering-based approach to biomedical knowledge discovery.
Lever, Jake; Gakkhar, Sitanshu; Gottlieb, Michael; Rashnavadi, Tahereh; Lin, Santina; Siu, Celia; Smith, Maia; Jones, Martin R; Krzywinski, Martin; Jones, Steven J M; Wren, Jonathan
2018-02-15
The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers. We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations. All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery. sjones@bcgsc.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Database systems for knowledge-based discovery.
Jagarlapudi, Sarma A R P; Kishan, K V Radha
2009-01-01
Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.
Concept of operations for knowledge discovery from Big Data across enterprise data warehouses
NASA Astrophysics Data System (ADS)
Sukumar, Sreenivas R.; Olama, Mohammed M.; McNair, Allen W.; Nutaro, James J.
2013-05-01
The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Options that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.
Data Mining and Knowledge Discovery tools for exploiting big Earth-Observation data
NASA Astrophysics Data System (ADS)
Espinoza Molina, D.; Datcu, M.
2015-04-01
The continuous increase in the size of the archives and in the variety and complexity of Earth-Observation (EO) sensors require new methodologies and tools that allow the end-user to access a large image repository, to extract and to infer knowledge about the patterns hidden in the images, to retrieve dynamically a collection of relevant images, and to support the creation of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In this context, we are concerned with providing a platform for data mining and knowledge discovery content from EO archives. The platform's goal is to implement a communication channel between Payload Ground Segments and the end-user who receives the content of the data coded in an understandable format associated with semantics that is ready for immediate exploitation. It will provide the user with automated tools to explore and understand the content of highly complex images archives. The challenge lies in the extraction of meaningful information and understanding observations of large extended areas, over long periods of time, with a broad variety of EO imaging sensors in synergy with other related measurements and data. The platform is composed of several components such as 1.) ingestion of EO images and related data providing basic features for image analysis, 2.) query engine based on metadata, semantics and image content, 3.) data mining and knowledge discovery tools for supporting the interpretation and understanding of image content, 4.) semantic definition of the image content via machine learning methods. All these components are integrated and supported by a relational database management system, ensuring the integrity and consistency of Terabytes of Earth Observation data.
ERIC Educational Resources Information Center
Carter, Sunshine; Traill, Stacie
2017-01-01
Electronic resource access troubleshooting is familiar work in most libraries. The added complexity introduced when a library implements a web-scale discovery service, however, creates a strong need for well-organized, rigorous training to enable troubleshooting staff to provide the best service possible. This article outlines strategies, tools,…
Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases
Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand
2010-01-01
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778
Danchin, Antoine; Ouzounis, Christos; Tokuyasu, Taku; Zucker, Jean-Daniel
2018-07-01
Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader. © 2018 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Temporal data mining for the quality assessment of hemodialysis services.
Bellazzi, Riccardo; Larizza, Cristiana; Magni, Paolo; Bellazzi, Roberto
2005-05-01
This paper describes the temporal data mining aspects of a research project that deals with the definition of methods and tools for the assessment of the clinical performance of hemodialysis (HD) services, on the basis of the time series automatically collected during hemodialysis sessions. Intelligent data analysis and temporal data mining techniques are applied to gain insight and to discover knowledge on the causes of unsatisfactory clinical results. In particular, two new methods for association rule discovery and temporal rule discovery are applied to the time series. Such methods exploit several pre-processing techniques, comprising data reduction, multi-scale filtering and temporal abstractions. We have analyzed the data of more than 5800 dialysis sessions coming from 43 different patients monitored for 19 months. The qualitative rules associating the outcome parameters and the measured variables were examined by the domain experts, which were able to distinguish between rules confirming available background knowledge and unexpected but plausible rules. The new methods proposed in the paper are suitable tools for knowledge discovery in clinical time series. Their use in the context of an auditing system for dialysis management helped clinicians to improve their understanding of the patients' behavior.
Dewey: How to Make It Work for You
ERIC Educational Resources Information Center
Panzer, Michael
2013-01-01
As knowledge brokers, librarians are living in interesting times for themselves and libraries. It causes them to wonder sometimes if the traditional tools like the Dewey Decimal Classification (DDC) system can cope with the onslaught of information. The categories provided do not always seem adequate for the knowledge-discovery habits of…
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.
ERIC Educational Resources Information Center
Trybula, Walter J.; Wyllys, Ronald E.
2000-01-01
Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
NASA Astrophysics Data System (ADS)
Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.
2006-12-01
The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and partner agencies to efficiently identify the significant results for new experiment directions and principle investigators to formulate experiment directions for new proposals.
2007-01-28
is interested in B2B and B2C e-commerce, enterprise resource planning, e-procurement, supply-chain management, data mining, and knowledge discovery... social networking tools, collaborative spaces, knowledge management, “connecting-enabling” protocols like RSS, and other tools. The intent of the ILE...delivered to them, what learning pedagogy is appropriate for them, the optimal level of social interaction for learning, and available resources
75 FR 71005 - American Education Week, 2010
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-22
... maintain our Nation's role as the world's engine of discovery and innovation, my Administration is.... Our Nation's schools can give students the tools, skills, and knowledge to participate fully in our...
Beginning to manage drug discovery and development knowledge.
Sumner-Smith, M
2001-05-01
Knowledge management approaches and technologies are beginning to be implemented by the pharmaceutical industry in support of new drug discovery and development processes aimed at greater efficiencies and effectiveness. This trend coincides with moves to reduce paper, coordinate larger teams with more diverse skills that are distributed around the globe, and to comply with regulatory requirements for electronic submissions and the associated maintenance of electronic records. Concurrently, the available technologies have implemented web-based architectures with a greater range of collaborative tools and personalization through portal approaches. However, successful application of knowledge management methods depends on effective cultural change management, as well as proper architectural design to match the organizational and work processes within a company.
USDA-ARS?s Scientific Manuscript database
Valuable information on the location and context of ecological studies are locked up in publications in myriad formats that are not easily machine readable. This presents significant challenges to building geographic-based tools to search for and visualize sources of ecological knowledge. JournalMap...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-02-02
... planned outputs are expected to contribute to advances in knowledge, improvements in policy and practice... of accomplishments (e.g., new or improved tools, methods, discoveries, standards, interventions...
Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.
Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.
2013-01-01
The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Olama, Mohammed M; Nutaro, James J; Sukumar, Sreenivas R
2013-01-01
The success of data-driven business in government, science, and private industry is driving the need for seamless integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends, patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume, velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely on the success of their analysts have to make investment decisions for sustainable data/information systems and knowledge discovery. Optionsmore » that organizations are considering are newer storage/analysis architectures, better analysis machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven organizations can leverage towards making their investment decisions. We base our recommendations on the experience gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the foundation of future knowledge nurturing data-system architectures.« less
ERIC Educational Resources Information Center
Polavaram, Sridevi
2016-01-01
Neuroscience can greatly benefit from using novel methods in computer science and informatics, which enable knowledge discovery in unexpected ways. Currently one of the biggest challenges in Neuroscience is to map the functional circuitry of the brain. The applications of this goal range from understanding structural reorganization of neurons to…
Analytics for Cyber Network Defense
DOE Office of Scientific and Technical Information (OSTI.GOV)
Plantenga, Todd.; Kolda, Tamara Gibson
2011-06-01
This report provides a brief survey of analytics tools considered relevant to cyber network defense (CND). Ideas and tools come from elds such as statistics, data mining, and knowledge discovery. Some analytics are considered standard mathematical or statistical techniques, while others re ect current research directions. In all cases the report attempts to explain the relevance to CND with brief examples.
Antisense oligonucleotide technologies in drug discovery.
Aboul-Fadl, Tarek
2006-09-01
The principle of antisense oligonucleotide (AS-OD) technologies is based on the specific inhibition of unwanted gene expression by blocking mRNA activity. It has long appeared to be an ideal strategy to leverage new genomic knowledge for drug discovery and development. In recent years, AS-OD technologies have been widely used as potent and promising tools for this purpose. There is a rapid increase in the number of antisense molecules progressing in clinical trials. AS-OD technologies provide a simple and efficient approach for drug discovery and development and are expected to become a reality in the near future. This editorial describes the established and emerging AS-OD technologies in drug discovery.
Research to knowledge: promoting the training of physician-scientists in the biology of pregnancy.
Sadovsky, Yoel; Caughey, Aaron B; DiVito, Michelle; D'Alton, Mary E; Murtha, Amy P
2018-01-01
Common disorders of pregnancy, such as preeclampsia, preterm birth, and fetal growth abnormalities, continue to challenge perinatal biologists seeking insights into disease pathogenesis that will result in better diagnosis, therapy, and disease prevention. These challenges have recently been intensified with discoveries that associate gestational diseases with long-term maternal and neonatal outcomes. Whereas modern high-throughput investigative tools enable scientists and clinicians to noninvasively probe the maternal-fetal genome, epigenome, and other analytes, their implications for clinical medicine remain uncertain. Bridging these knowledge gaps depends on strengthening the existing pool of scientists with expertise in basic, translational, and clinical tools to address pertinent questions in the biology of pregnancy. Although PhD researchers are critical in this quest, physician-scientists would facilitate the inquiry by bringing together clinical challenges and investigative tools, promoting a culture of intellectual curiosity among clinical providers, and helping transform discoveries into relevant knowledge and clinical solutions. Uncertainties related to future administration of health care, federal support for research, attrition of physician-scientists, and an inadequate supply of new scholars may jeopardize our ability to address these challenges. New initiatives are necessary to attract current scholars and future generations of researchers seeking expertise in the scientific method and to support them, through mentorship and guidance, in pursuing a career that combines scientific investigation with clinical medicine. These efforts will promote breadth and depth of inquiry into the biology of pregnancy and enhance the pace of translation of scientific discoveries into better medicine and disease prevention. Copyright © 2017 Elsevier Inc. All rights reserved.
Create your own science planning tool in 3 days with SOA
NASA Technical Reports Server (NTRS)
Streiffert, Barbara A.; Polanskey, Carol A.; O'Reilly, Taifun
2003-01-01
Scientific discovery and advancement of knowledge has been, and continues to be, the goal for space missions at Jet Propulsion Laboratory. Scientist must plan their observation/experiments to get the maximum data return in order to make those discoveries. However, each mission has different science objectives, a different spacecraft and different instrument payloads, as well as, different routes to different destinations with different spacecraft restrictions and characteristics. In the current reduced cost environment, manageable cost for mission planning software is a must. Science Opportunity Analyzer (SOA), a planning tool for scientists and mission planners, utilizes a simple approach to reduce cost and promote reusability.
Gerlt, John A
2017-08-22
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
2017-01-01
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221
Analytical considerations for mass spectrometry profiling in serum biomarker discovery.
Whiteley, Gordon R; Colantonio, Simona; Sacconi, Andrea; Saul, Richard G
2009-03-01
The potential of using mass spectrometry profiling as a diagnostic tool has been demonstrated for a wide variety of diseases. Various cancers and cancer-related diseases have been the focus of much of this work because of both the paucity of good diagnostic markers and the knowledge that early diagnosis is the most powerful weapon in treating cancer. The implementation of mass spectrometry as a routine diagnostic tool has proved to be difficult, however, primarily because of the stringent controls that are required for the method to be reproducible. The method is evolving as a powerful guide to the discovery of biomarkers that could, in turn, be used either individually or in an array or panel of tests for early disease detection. Using proteomic patterns to guide biomarker discovery and the possibility of deployment in the clinical laboratory environment on current instrumentation or in a hybrid technology has the possibility of being the early diagnosis tool that is needed.
McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.
2012-01-01
Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946
Building Knowledge Graphs for NASA's Earth Science Enterprise
NASA Astrophysics Data System (ADS)
Zhang, J.; Lee, T. J.; Ramachandran, R.; Shi, R.; Bao, Q.; Gatlin, P. N.; Weigel, A. M.; Maskey, M.; Miller, J. J.
2016-12-01
Inspired by Google Knowledge Graph, we have been building a prototype Knowledge Graph for Earth scientists, connecting information and data in NASA's Earth science enterprise. Our primary goal is to advance the state-of-the-art NASA knowledge extraction capability by going beyond traditional catalog search and linking different distributed information (such as data, publications, services, tools and people). This will enable a more efficient pathway to knowledge discovery. While Google Knowledge Graph provides impressive semantic-search and aggregation capabilities, it is limited to search topics for general public. We use the similar knowledge graph approach to semantically link information gathered from a wide variety of sources within the NASA Earth Science enterprise. Our prototype serves as a proof of concept on the viability of building an operational "knowledge base" system for NASA Earth science. Information is pulled from structured sources (such as NASA CMR catalog, GCMD, and Climate and Forecast Conventions) and unstructured sources (such as research papers). Leveraging modern techniques of machine learning, information retrieval, and deep learning, we provide an integrated data mining and information discovery environment to help Earth scientists to use the best data, tools, methodologies, and models available to answer a hypothesis. Our knowledge graph would be able to answer questions like: Which articles discuss topics investigating similar hypotheses? How have these methods been tested for accuracy? Which approaches have been highly cited within the scientific community? What variables were used for this method and what datasets were used to represent them? What processing was necessary to use this data? These questions then lead researchers and citizen scientists to investigate the sources where data can be found, available user guides, information on how the data was acquired, and available tools and models to use with this data. As a proof of concept, we focus on a well-defined domain - Hurricane Science linking research articles and their findings, data, people and tools/services. Modern information retrieval, natural language processing machine learning and deep learning techniques are applied to build the knowledge network.
Payne, Philip R O; Kwok, Alan; Dhaval, Rakesh; Borlawsky, Tara B
2009-03-01
The conduct of large-scale translational studies presents significant challenges related to the storage, management and analysis of integrative data sets. Ideally, the application of methodologies such as conceptual knowledge discovery in databases (CKDD) provides a means for moving beyond intuitive hypothesis discovery and testing in such data sets, and towards the high-throughput generation and evaluation of knowledge-anchored relationships between complex bio-molecular and phenotypic variables. However, the induction of such high-throughput hypotheses is non-trivial, and requires correspondingly high-throughput validation methodologies. In this manuscript, we describe an evaluation of the efficacy of a natural language processing-based approach to validating such hypotheses. As part of this evaluation, we will examine a phenomenon that we have labeled as "Conceptual Dissonance" in which conceptual knowledge derived from two or more sources of comparable scope and granularity cannot be readily integrated or compared using conventional methods and automated tools.
Literature Mining and Knowledge Discovery Tools for Virtual Tissues
Virtual Tissues (VTs) are in silico models that simulate the cellular fabric of tissues to analyze complex relationships and predict multicellular behaviors in specific biological systems such as the mature liver (v-Liver™) or developing embryo (v-Embryo™). VT models require inpu...
Lipidomics from an analytical perspective.
Sandra, Koen; Sandra, Pat
2013-10-01
The global non-targeted analysis of various biomolecules in a variety of sample sources gained momentum in recent years. Defined as the study of the full lipid complement of cells, tissues and organisms, lipidomics is currently evolving out of the shadow of the more established omics sciences including genomics, transcriptomics, proteomics and metabolomics. In analogy to the latter, lipidomics has the potential to impact on biomarker discovery, drug discovery/development and system knowledge, amongst others. The tools developed by lipid researchers in the past, complemented with the enormous advancements made in recent years in mass spectrometry and chromatography, and the implementation of sophisticated (bio)-informatics tools form the basis of current lipidomics technologies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Therapeutic Potential of Foldamers: From Chemical Biology Tools To Drug Candidates?
Gopalakrishnan, Ranganath; Frolov, Andrey I; Knerr, Laurent; Drury, William J; Valeur, Eric
2016-11-10
Over the past decade, foldamers have progressively emerged as useful architectures to mimic secondary structures of proteins. Peptidic foldamers, consisting of various amino acid based backbones, have been the most studied from a therapeutic perspective, while polyaromatic foldamers have barely evolved from their nascency and remain perplexing for medicinal chemists due to their poor drug-like nature. Despite these limitations, this compound class may still offer opportunities to study challenging targets or provide chemical biology tools. The potential of foldamer drug candidates reaching the clinic is still a stretch. Nevertheless, advances in the field have demonstrated their potential for the discovery of next generation therapeutics. In this perspective, the current knowledge of foldamers is reviewed in a drug discovery context. Recent advances in the early phases of drug discovery including hit finding, target validation, and optimization and molecular modeling are discussed. In addition, challenges and focus areas are debated and gaps highlighted.
Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.
2014-01-01
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545
Developing integrated crop knowledge networks to advance candidate gene discovery.
Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher
2016-12-01
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Berler, Alexander; Pavlopoulos, Sotiris; Koutsouris, Dimitris
2005-06-01
The advantages of the introduction of information and communication technologies in the complex health-care sector are already well-known and well-stated in the past. It is, nevertheless, paradoxical that although the medical community has embraced with satisfaction most of the technological discoveries allowing the improvement in patient care, this has not happened when talking about health-care informatics. Taking the above issue of concern, our work proposes an information model for knowledge management (KM) based upon the use of key performance indicators (KPIs) in health-care systems. Based upon the use of the balanced scorecard (BSC) framework (Kaplan/Norton) and quality assurance techniques in health care (Donabedian), this paper is proposing a patient journey centered approach that drives information flow at all levels of the day-to-day process of delivering effective and managed care, toward information assessment and knowledge discovery. In order to persuade health-care decision-makers to assess the added value of KM tools, those should be used to propose new performance measurement and performance management techniques at all levels of a health-care system. The proposed KPIs are forming a complete set of metrics that enable the performance management of a regional health-care system. In addition, the performance framework established is technically applied by the use of state-of-the-art KM tools such as data warehouses and business intelligence information systems. In that sense, the proposed infrastructure is, technologically speaking, an important KM tool that enables knowledge sharing amongst various health-care stakeholders and between different health-care groups. The use of BSC is an enabling framework toward a KM strategy in health care.
Big data analytics in immunology: a knowledge-based approach.
Zhang, Guang Lan; Sun, Jing; Chitkushev, Lou; Brusic, Vladimir
2014-01-01
With the vast amount of immunological data available, immunology research is entering the big data era. These data vary in granularity, quality, and complexity and are stored in various formats, including publications, technical reports, and databases. The challenge is to make the transition from data to actionable knowledge and wisdom and bridge the knowledge gap and application gap. We report a knowledge-based approach based on a framework called KB-builder that facilitates data mining by enabling fast development and deployment of web-accessible immunological data knowledge warehouses. Immunological knowledge discovery relies heavily on both the availability of accurate, up-to-date, and well-organized data and the proper analytics tools. We propose the use of knowledge-based approaches by developing knowledgebases combining well-annotated data with specialized analytical tools and integrating them into analytical workflow. A set of well-defined workflow types with rich summarization and visualization capacity facilitates the transformation from data to critical information and knowledge. By using KB-builder, we enabled streamlining of normally time-consuming processes of database development. The knowledgebases built using KB-builder will speed up rational vaccine design by providing accurate and well-annotated data coupled with tailored computational analysis tools and workflow.
Modeling & Informatics at Vertex Pharmaceuticals Incorporated: our philosophy for sustained impact
NASA Astrophysics Data System (ADS)
McGaughey, Georgia; Patrick Walters, W.
2017-03-01
Molecular modelers and informaticians have the unique opportunity to integrate cross-functional data using a myriad of tools, methods and visuals to generate information. Using their drug discovery expertise, information is transformed to knowledge that impacts drug discovery. These insights are often times formulated locally and then applied more broadly, which influence the discovery of new medicines. This is particularly true in an organization where the members are exposed to projects throughout an organization, such as in the case of the global Modeling & Informatics group at Vertex Pharmaceuticals. From its inception, Vertex has been a leader in the development and use of computational methods for drug discovery. In this paper, we describe the Modeling & Informatics group at Vertex and the underlying philosophy, which has driven this team to sustain impact on the discovery of first-in-class transformative medicines.
Genome wide association studies on yield components using a lentil genetic diversity panel
USDA-ARS?s Scientific Manuscript database
The cool season food legume research community are now at the threshold of deploying the cutting-edge molecular genetics and genomics tools that have led to significant and rapid expansion of gene discovery, knowledge of gene function (including tolerance to biotic and abiotic stresses) and genetic ...
Knowledge Discovery and Data Mining in Iran's Climatic Researches
NASA Astrophysics Data System (ADS)
Karimi, Mostafa
2013-04-01
Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.
Scientific Knowledge Discovery in Complex Semantic Networks of Geophysical Systems
NASA Astrophysics Data System (ADS)
Fox, P.
2012-04-01
The vast majority of explorations of the Earth's systems are limited in their ability to effectively explore the most important (often most difficult) problems because they are forced to interconnect at the data-element, or syntactic, level rather than at a higher scientific, or semantic, level. Recent successes in the application of complex network theory and algorithms to climate data, raise expectations that more general graph-based approaches offer the opportunity for new discoveries. In the past ~ 5 years in the natural sciences there has substantial progress in providing both specialists and non-specialists the ability to describe in machine readable form, geophysical quantities and relations among them in meaningful and natural ways, effectively breaking the prior syntax barrier. The corresponding open-world semantics and reasoning provide higher-level interconnections. That is, semantics provided around the data structures, using semantically-equipped tools, and semantically aware interfaces between science application components allowing for discovery at the knowledge level. More recently, formal semantic approaches to continuous and aggregate physical processes are beginning to show promise and are soon likely to be ready to apply to geoscientific systems. To illustrate these opportunities, this presentation presents two application examples featuring domain vocabulary (ontology) and property relations (named and typed edges in the graphs). First, a climate knowledge discovery pilot encoding and exploration of CMIP5 catalog information with the eventual goal to encode and explore CMIP5 data. Second, a multi-stakeholder knowledge network for integrated assessments in marine ecosystems, where the data is highly inter-disciplinary.
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Antibacterial Drug Discovery: Some Assembly Required.
Tommasi, Rubén; Iyer, Ramkumar; Miller, Alita A
2018-05-11
Our limited understanding of the molecular basis for compound entry into and efflux out of Gram-negative bacteria is now recognized as a key bottleneck for the rational discovery of novel antibacterial compounds. Traditional, large-scale biochemical or target-agnostic phenotypic antibacterial screening efforts have, as a result, not been very fruitful. A main driver of this knowledge gap has been the historical lack of predictive cellular assays, tools, and models that provide structure-activity relationships to inform optimization of compound accumulation. A variety of recent approaches has recently been described to address this conundrum. This Perspective explores these approaches and considers ways in which their integration could successfully redirect antibacterial drug discovery efforts.
2001-12-01
Group 1999, Davenport and Prusak 1998). Although differences do exist, the four models are similar. In the amalgamated model , the phases of the KMLC...15 Phase 1, create, is the discovery and development of new knowledge (Despres and Chavel 1999, Gartner Group 1999). Phase 2, organize, involves...This generally entails modeling and analysis that results in one or more (re)designs for the process in question. The process, along with
Collaborative Web-Enabled GeoAnalytics Applied to OECD Regional Data
NASA Astrophysics Data System (ADS)
Jern, Mikael
Recent advances in web-enabled graphics technologies have the potential to make a dramatic impact on developing collaborative geovisual analytics (GeoAnalytics). In this paper, tools are introduced that help establish progress initiatives at international and sub-national levels aimed at measuring and collaborating, through statistical indicators, economic, social and environmental developments and to engage both statisticians and the public in such activities. Given this global dimension of such a task, the “dream” of building a repository of progress indicators, where experts and public users can use GeoAnalytics collaborative tools to compare situations for two or more countries, regions or local communities, could be accomplished. While the benefits of GeoAnalytics tools are many, it remains a challenge to adapt these dynamic visual tools to the Internet. For example, dynamic web-enabled animation that enables statisticians to explore temporal, spatial and multivariate demographics data from multiple perspectives, discover interesting relationships, share their incremental discoveries with colleagues and finally communicate selected relevant knowledge to the public. These discoveries often emerge through the diverse backgrounds and experiences of expert domains and are precious in a creative analytics reasoning process. In this context, we introduce a demonstrator “OECD eXplorer”, a customized tool for interactively analyzing, and collaborating gained insights and discoveries based on a novel story mechanism that capture, re-use and share task-related explorative events.
Ahmed, Wamiq M; Lenz, Dominik; Liu, Jia; Paul Robinson, J; Ghafoor, Arif
2008-03-01
High-throughput biological imaging uses automated imaging devices to collect a large number of microscopic images for analysis of biological systems and validation of scientific hypotheses. Efficient manipulation of these datasets for knowledge discovery requires high-performance computational resources, efficient storage, and automated tools for extracting and sharing such knowledge among different research sites. Newly emerging grid technologies provide powerful means for exploiting the full potential of these imaging techniques. Efficient utilization of grid resources requires the development of knowledge-based tools and services that combine domain knowledge with analysis algorithms. In this paper, we first investigate how grid infrastructure can facilitate high-throughput biological imaging research, and present an architecture for providing knowledge-based grid services for this field. We identify two levels of knowledge-based services. The first level provides tools for extracting spatiotemporal knowledge from image sets and the second level provides high-level knowledge management and reasoning services. We then present cellular imaging markup language, an extensible markup language-based language for modeling of biological images and representation of spatiotemporal knowledge. This scheme can be used for spatiotemporal event composition, matching, and automated knowledge extraction and representation for large biological imaging datasets. We demonstrate the expressive power of this formalism by means of different examples and extensive experimental results.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi
2018-01-01
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi
2018-02-27
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
ERIC Educational Resources Information Center
Stern, David
2003-01-01
Discusses questions to consider as chemistry libraries develop new information storage and retrieval systems. Addresses new integrated tools for data manipulation that will guarantee access to information; differential pricing and package plans and effects on libraries' budgeting; and the changing role of the librarian. (LRW)
Healthcare applications of knowledge discovery in databases.
DeGruy, K B
2000-01-01
Many healthcare leaders find themselves overwhelmed with data, but lack the information they need to make informed decisions. Knowledge discovery in databases (KDD) can help organizations turn their data into information. KDD is the process of finding complex patterns and relationships in data. The tools and techniques of KDD have achieved impressive results in other industries, and healthcare needs to take advantage of advances in this exciting field. Recent advances in the KDD field have brought it from the realm of research institutions and large corporations to many smaller companies. Software and hardware advances enable small organizations to tap the power of KDD using desktop PCs. KDD has been used extensively for fraud detection and focused marketing. There is a wealth of data available within the healthcare industry that would benefit from the application of KDD tools and techniques. Providers and payers have a vast quantity of data (such as, charges and claims), but not effective way to analyze the data to accurately determine relationships and trends. Organizations that take advantage of KDD techniques will find that they offer valuable assistance in the quest to lower healthcare costs while improving healthcare quality.
Introduction to fragment-based drug discovery.
Erlanson, Daniel A
2012-01-01
Fragment-based drug discovery (FBDD) has emerged in the past decade as a powerful tool for discovering drug leads. The approach first identifies starting points: very small molecules (fragments) that are about half the size of typical drugs. These fragments are then expanded or linked together to generate drug leads. Although the origins of the technique date back some 30 years, it was only in the mid-1990s that experimental techniques became sufficiently sensitive and rapid for the concept to be become practical. Since that time, the field has exploded: FBDD has played a role in discovery of at least 18 drugs that have entered the clinic, and practitioners of FBDD can be found throughout the world in both academia and industry. Literally dozens of reviews have been published on various aspects of FBDD or on the field as a whole, as have three books (Jahnke and Erlanson, Fragment-based approaches in drug discovery, 2006; Zartler and Shapiro, Fragment-based drug discovery: a practical approach, 2008; Kuo, Fragment based drug design: tools, practical approaches, and examples, 2011). However, this chapter will assume that the reader is approaching the field with little prior knowledge. It will introduce some of the key concepts, set the stage for the chapters to follow, and demonstrate how X-ray crystallography plays a central role in fragment identification and advancement.
Conceptual Tools for Understanding Nature - Proceedings of the 3rd International Symposium
NASA Astrophysics Data System (ADS)
Costa, G.; Calucci, M.
1997-04-01
The Table of Contents for the full book PDF is as follows: * Foreword * Some Limits of Science and Scientists * Three Limits of Scientific Knowledge * On Features and Meaning of Scientific Knowledge * How Science Approaches the World: Risky Truths versus Misleading Certitudes * On Discovery and Justification * Thought Experiments: A Philosophical Analysis * Causality: Epistemological Questions and Cognitive Answers * Scientific Inquiry via Rational Hypothesis Revision * Probabilistic Epistemology * The Transferable Belief Model for Uncertainty Representation * Chemistry and Complexity * The Difficult Epistemology of Medicine * Epidemiology, Causality and Medical Anthropology * Conceptual Tools for Transdisciplinary Unified Theory * Evolution and Learning in Economic Organizations * The Possible Role of Symmetry in Physics and Cosmology * Observational Cosmology and/or other Imaginable Models of the Universe
Ontology-guided data preparation for discovering genotype-phenotype relationships.
Coulet, Adrien; Smaïl-Tabbone, Malika; Benlian, Pascale; Napoli, Amedeo; Devignes, Marie-Dominique
2008-04-25
Complexity and amount of post-genomic data constitute two major factors limiting the application of Knowledge Discovery in Databases (KDD) methods in life sciences. Bio-ontologies may nowadays play key roles in knowledge discovery in life science providing semantics to data and to extracted units, by taking advantage of the progress of Semantic Web technologies concerning the understanding and availability of tools for knowledge representation, extraction, and reasoning. This paper presents a method that exploits bio-ontologies for guiding data selection within the preparation step of the KDD process. We propose three scenarios in which domain knowledge and ontology elements such as subsumption, properties, class descriptions, are taken into account for data selection, before the data mining step. Each of these scenarios is illustrated within a case-study relative to the search of genotype-phenotype relationships in a familial hypercholesterolemia dataset. The guiding of data selection based on domain knowledge is analysed and shows a direct influence on the volume and significance of the data mining results. The method proposed in this paper is an efficient alternative to numerical methods for data selection based on domain knowledge. In turn, the results of this study may be reused in ontology modelling and data integration.
Handling knowledge via Concept Maps: a space weather use case
NASA Astrophysics Data System (ADS)
Messerotti, Mauro; Fox, Peter
Concept Maps (Cmaps) are powerful means for knowledge coding in graphical form. As flexible software tools exist to manipulate the knowledge embedded in Cmaps in machine-readable form, such complex entities are suitable candidates not only for the representation of ontologies and semantics in Virtual Observatory (VO) architectures, but also for knowledge handling and knowledge discovery. In this work, we present a use case relevant to space weather applications and we elaborate on its possible implementation and adavanced use in Semantic Virtual Observatories dedicated to Sun-Earth Connections. This analysis was carried out in the framework of the Electronic Geophysical Year (eGY) and represents an achievement synergized by the eGY Virtual Observatories Working Group.
Freely Accessible Chemical Database Resources of Compounds for in Silico Drug Discovery.
Yang, JingFang; Wang, Di; Jia, Chenyang; Wang, Mengyao; Hao, GeFei; Yang, GuangFu
2018-05-07
In silico drug discovery has been proved to be a solidly established key component in early drug discovery. However, this task is hampered by the limitation of quantity and quality of compound databases for screening. In order to overcome these obstacles, freely accessible database resources of compounds have bloomed in recent years. Nevertheless, how to choose appropriate tools to treat these freely accessible databases are crucial. To the best of our knowledge, this is the first systematic review on this issue. The existed advantages and drawbacks of chemical databases were analyzed and summarized based on the collected six categories of freely accessible chemical databases from literature in this review. Suggestions on how and in which conditions the usage of these databases could be reasonable were provided. Tools and procedures for building 3D structure chemical libraries were also introduced. In this review, we described the freely accessible chemical database resources for in silico drug discovery. In particular, the chemical information for building chemical database appears as attractive resources for drug design to alleviate experimental pressure. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Anguera, A; Barreiro, J M; Lara, J A; Lizcano, D
2016-01-01
One of the major challenges in the medical domain today is how to exploit the huge amount of data that this field generates. To do this, approaches are required that are capable of discovering knowledge that is useful for decision making in the medical field. Time series are data types that are common in the medical domain and require specialized analysis techniques and tools, especially if the information of interest to specialists is concentrated within particular time series regions, known as events. This research followed the steps specified by the so-called knowledge discovery in databases (KDD) process to discover knowledge from medical time series derived from stabilometric (396 series) and electroencephalographic (200) patient electronic health records (EHR). The view offered in the paper is based on the experience gathered as part of the VIIP project. Knowledge discovery in medical time series has a number of difficulties and implications that are highlighted by illustrating the application of several techniques that cover the entire KDD process through two case studies. This paper illustrates the application of different knowledge discovery techniques for the purposes of classification within the above domains. The accuracy of this application for the two classes considered in each case is 99.86% and 98.11% for epilepsy diagnosis in the electroencephalography (EEG) domain and 99.4% and 99.1% for early-age sports talent classification in the stabilometry domain. The KDD techniques achieve better results than other traditional neural network-based classification techniques.
ERIC Educational Resources Information Center
Mohamed, Fahim; Abdeslam, Jakimi; Lahcen, El Bermi
2017-01-01
Virtual Environments for Training (VET) are useful tools for visualization, discovery as well as for training. VETs are based on virtual reality technique to put learners in training situations that emulate genuine situations. VETs have proven to be advantageous in putting learners into varied training situations to acquire knowledge and…
ERIC Educational Resources Information Center
National Academies Press, 2013
2013-01-01
Spurred on by new discoveries and rapid technological advances, the capacity for life science research is expanding across the globe-and with it comes concerns about the unintended impacts of research on the physical and biological environment, human well-being, or the deliberate misuse of knowledge, tools, and techniques to cause harm. This…
Key Relation Extraction from Biomedical Publications.
Huang, Lan; Wang, Ye; Gong, Leiguang; Kulikowski, Casimir; Bai, Tian
2017-01-01
Within the large body of biomedical knowledge, recent findings and discoveries are most often presented as research articles. Their number has been increasing sharply since the turn of the century, presenting ever-growing challenges for search and discovery of knowledge and information related to specific topics of interest, even with the help of advanced online search tools. This is especially true when the goal of a search is to find or discover key relations between important concepts or topic words. We have developed an innovative method for extracting key relations between concepts from abstracts of articles. The method focuses on relations between keywords or topic words in the articles. Early experiments with the method on PubMed publications have shown promising results in searching and discovering keywords and their relationships that are strongly related to the main topic of an article.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver
2009-11-20
Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis
2014-11-15
The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Prediction of intracellular exposure bridges the gap between target- and cell-based drug discovery
Gordon, Laurie J.; Wayne, Gareth J.; Almqvist, Helena; Axelsson, Hanna; Seashore-Ludlow, Brinton; Treyer, Andrea; Lundbäck, Thomas; West, Andy; Hann, Michael M.; Artursson, Per
2017-01-01
Inadequate target exposure is a major cause of high attrition in drug discovery. Here, we show that a label-free method for quantifying the intracellular bioavailability (Fic) of drug molecules predicts drug access to intracellular targets and hence, pharmacological effect. We determined Fic in multiple cellular assays and cell types representing different targets from a number of therapeutic areas, including cancer, inflammation, and dementia. Both cytosolic targets and targets localized in subcellular compartments were investigated. Fic gives insights on membrane-permeable compounds in terms of cellular potency and intracellular target engagement, compared with biochemical potency measurements alone. Knowledge of the amount of drug that is locally available to bind intracellular targets provides a powerful tool for compound selection in early drug discovery. PMID:28701380
Krumholz, Harlan M
2014-07-01
Big data in medicine--massive quantities of health care data accumulating from patients and populations and the advanced analytics that can give those data meaning--hold the prospect of becoming an engine for the knowledge generation that is necessary to address the extensive unmet information needs of patients, clinicians, administrators, researchers, and health policy makers. This article explores the ways in which big data can be harnessed to advance prediction, performance, discovery, and comparative effectiveness research to address the complexity of patients, populations, and organizations. Incorporating big data and next-generation analytics into clinical and population health research and practice will require not only new data sources but also new thinking, training, and tools. Adequately utilized, these reservoirs of data can be a practically inexhaustible source of knowledge to fuel a learning health care system. Project HOPE—The People-to-People Health Foundation, Inc.
Krumholz, Harlan M.
2017-01-01
Big data in medicine--massive quantities of health care data accumulating from patients and populations and the advanced analytics that can give it meaning--hold the prospect of becoming an engine for the knowledge generation that is necessary to address the extensive unmet information needs of patients, clinicians, administrators, researchers, and health policy makers. This paper explores the ways in which big data can be harnessed to advance prediction, performance, discovery, and comparative effectiveness research to address the complexity of patients, populations, and organizations. Incorporating big data and next-generation analytics into clinical and population health research and practice will require not only new data sources but also new thinking, training, and tools. Adequately used, these reservoirs of data can be a practically inexhaustible source of knowledge to fuel a learning health care system. PMID:25006142
PubMedMiner: Mining and Visualizing MeSH-based Associations in PubMed.
Zhang, Yucan; Sarkar, Indra Neil; Chen, Elizabeth S
2014-01-01
The exponential growth of biomedical literature provides the opportunity to develop approaches for facilitating the identification of possible relationships between biomedical concepts. Indexing by Medical Subject Headings (MeSH) represent high-quality summaries of much of this literature that can be used to support hypothesis generation and knowledge discovery tasks using techniques such as association rule mining. Based on a survey of literature mining tools, a tool implemented using Ruby and R - PubMedMiner - was developed in this study for mining and visualizing MeSH-based associations for a set of MEDLINE articles. To demonstrate PubMedMiner's functionality, a case study was conducted that focused on identifying and comparing comorbidities for asthma in children and adults. Relative to the tools surveyed, the initial results suggest that PubMedMiner provides complementary functionality for summarizing and comparing topics as well as identifying potentially new knowledge.
A knowledgebase system to enhance scientific discovery: Telemakus
Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M
2004-01-01
Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested. The rationale and system architecture are described and plans for the future are discussed. PMID:15507158
NASA Astrophysics Data System (ADS)
Cook, R.; Michener, W.; Vieglais, D.; Budden, A.; Koskela, R.
2012-04-01
Addressing grand environmental science challenges requires unprecedented access to easily understood data that cross the breadth of temporal, spatial, and thematic scales. Tools are needed to plan management of the data, discover the relevant data, integrate heterogeneous and diverse data, and convert the data to information and knowledge. Addressing these challenges requires new approaches for the full data life cycle of managing, preserving, sharing, and analyzing data. DataONE (Observation Network for Earth) represents a virtual organization that enables new science and knowledge creation through preservation and access to data about life on Earth and the environment that sustains it. The DataONE approach is to improve data collection and management techniques; facilitate easy, secure, and persistent storage of data; continue to increase access to data and tools that improve data interoperability; disseminate integrated and user-friendly tools for data discovery and novel analyses; work with researchers to build intuitive data exploration and visualization tools; and support communities of practice via education, outreach, and stakeholder engagement.
Open-source tools for data mining.
Zupan, Blaz; Demsar, Janez
2008-03-01
With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
Knowledge Discovery from Databases: An Introductory Review.
ERIC Educational Resources Information Center
Vickery, Brian
1997-01-01
Introduces new procedures being used to extract knowledge from databases and discusses rationales for developing knowledge discovery methods. Methods are described for such techniques as classification, clustering, and the detection of deviations from pre-established norms. Examines potential uses of knowledge discovery in the information field.…
ERIC Educational Resources Information Center
Fawley, Nancy; Krysak, Nikki
2014-01-01
Some librarians embrace discovery tools while others refuse to use them. This lack of consensus can have consequences for student learning when there is inconsistent use, especially in large-scale instruction programs. The authors surveyed academic librarians whose institutions have a discovery tool and who teach information literacy classes in…
Promising Practices in Instruction of Discovery Tools
ERIC Educational Resources Information Center
Buck, Stefanie; Steffy, Christina
2013-01-01
Libraries are continually changing to meet the needs of users; this includes implementing discovery tools, also referred to as web-scale discovery tools, to make searching library resources easier. Because these tools are so new, it is difficult to establish definitive best practices for teaching these tools; however, promising practices are…
From Residency to Lifelong Learning.
Brandt, Keith
2015-11-01
The residency training experience is the perfect environment for learning. The university/institution patient population provides a never-ending supply of patients with unique management challenges. Resources abound that allow the discovery of knowledge about similar situations. Senior teachers provide counseling and help direct appropriate care. Periodic testing and evaluations identify deficiencies, which can be corrected with future study. What happens, however, when the resident graduates? Do they possess all the knowledge they'll need for the rest of their career? Will medical discovery stand still limiting the need for future study? If initial certification establishes that the physician has the skills and knowledge to function as an independent physician and surgeon, how do we assure the public that plastic surgeons will practice lifelong learning and remain safe throughout their career? Enter Maintenance of Certification (MOC). In an ideal world, MOC would provide many of the same tools as residency training: identification of gaps in knowledge, resources to correct those deficiencies, overall assessment of knowledge, feedback about communication skills and professionalism, and methods to evaluate and improve one's practice. This article discusses the need; for education and self-assessment that extends beyond residency training and a commitment to lifelong learning. The American Board of Plastic Surgery MOC program is described to demonstrate how it helps the diplomate reach the goal of continuous practice improvement.
Estiri, Hossein; Lovins, Terri; Afzalan, Nader; Stephens, Kari A.
2016-01-01
We applied a participatory design approach to define the objectives, characteristics, and features of a “data profiling” tool for primary care Electronic Health Data (EHD). Through three participatory design workshops, we collected input from potential tool users who had experience working with EHD. We present 15 recommended features and characteristics for the data profiling tool. From these recommendations we derived three overarching objectives and five properties for the tool. A data profiling tool, in Biomedical Informatics, is a visual, clear, usable, interactive, and smart tool that is designed to inform clinical and biomedical researchers of data utility and let them explore the data, while conveniently orienting the users to the tool’s functionalities. We suggest that developing scalable data profiling tools will provide new capacities to disseminate knowledge about clinical data that will foster translational research and accelerate new discoveries. PMID:27570651
DesAutels, Spencer J; Fox, Zachary E; Giuse, Dario A; Williams, Annette M; Kou, Qing-Hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems.
The relation between prior knowledge and students' collaborative discovery learning processes
NASA Astrophysics Data System (ADS)
Gijlers, Hannie; de Jong, Ton
2005-03-01
In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction with the environment was logged. Based on students' individual judgments of the truth-value and testability of a series of domain-specific propositions, a detailed description of the knowledge configuration for each dyad was created before they entered the learning environment. Qualitative analyses of two dialogues illustrated that prior knowledge influences the discovery learning processes, and knowledge development in a pair of students. Assessments of student and dyad definitional (domain-specific) knowledge, generic (mathematical and graph) knowledge, and generic (discovery) skills were related to the students' dialogue in different discovery learning processes. Results show that a high level of definitional prior knowledge is positively related to the proportion of communication regarding the interpretation of results. Heterogeneity with respect to generic prior knowledge was positively related to the number of utterances made in the discovery process categories hypotheses generation and experimentation. Results of the qualitative analyses indicated that collaboration between extremely heterogeneous dyads is difficult when the high achiever is not willing to scaffold information and work in the low achiever's zone of proximal development.
Biomedical discovery acceleration, with applications to craniofacial development.
Leach, Sonia M; Tipney, Hannah; Feng, Weiguo; Baumgartner, William A; Kasliwal, Priyanka; Schuyler, Ronald P; Williams, Trevor; Spritz, Richard A; Hunter, Lawrence
2009-03-01
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Cancer biology and implications for practice.
Rieger, Paula Trahan
2006-08-01
The media seem to announce a new scientific discovery related to cancer daily. Oncology nurses are challenged to keep up with the explosion of new knowledge and to understand how it ultimately relates to the care of patients with cancer. A framework for classifying new knowledge can be useful as nurses seek to understand the biology of cancer and its related implications for practice. To understand the molecular roots of cancer, healthcare practitioners specializing in cancer care require insight into genes, their messages, and the proteins produced from those messages, as well as the new tools of molecular biology.
Databases and Web Tools for Cancer Genomics Study
Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong
2015-01-01
Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591
Collection, Culturing, and Genome Analyses of Tropical Marine Filamentous Benthic Cyanobacteria.
Moss, Nathan A; Leao, Tiago; Glukhov, Evgenia; Gerwick, Lena; Gerwick, William H
2018-01-01
Decreasing sequencing costs has sparked widespread investigation of the use of microbial genomics to accelerate the discovery and development of natural products for therapeutic uses. Tropical marine filamentous cyanobacteria have historically produced many structurally novel natural products, and therefore present an excellent opportunity for the systematic discovery of new metabolites via the information derived from genomics and molecular genetics. Adequate knowledge transfer and institutional know-how are important to maintain the capability for studying filamentous cyanobacteria due to their unusual microbial morphology and characteristics. Here, we describe workflows, procedures, and commentary on sample collection, cultivation, genomic DNA generation, bioinformatics tools, and biosynthetic pathway analysis concerning filamentous cyanobacteria. © 2018 Elsevier Inc. All rights reserved.
Assessment of cardiovascular risk based on a data-driven knowledge discovery approach.
Mendes, D; Paredes, S; Rocha, T; Carvalho, P; Henriques, J; Cabiddu, R; Morais, J
2015-01-01
The cardioRisk project addresses the development of personalized risk assessment tools for patients who have been admitted to the hospital with acute myocardial infarction. Although there are models available that assess the short-term risk of death/new events for such patients, these models were established in circumstances that do not take into account the present clinical interventions and, in some cases, the risk factors used by such models are not easily available in clinical practice. The integration of the existing risk tools (applied in the clinician's daily practice) with data-driven knowledge discovery mechanisms based on data routinely collected during hospitalizations, will be a breakthrough in overcoming some of these difficulties. In this context, the development of simple and interpretable models (based on recent datasets), unquestionably will facilitate and will introduce confidence in this integration process. In this work, a simple and interpretable model based on a real dataset is proposed. It consists of a decision tree model structure that uses a reduced set of six binary risk factors. The validation is performed using a recent dataset provided by the Portuguese Society of Cardiology (11113 patients), which originally comprised 77 risk factors. A sensitivity, specificity and accuracy of, respectively, 80.42%, 77.25% and 78.80% were achieved showing the effectiveness of the approach.
Information Fusion for Natural and Man-Made Disasters
2007-01-31
comprehensively large, and metaphysically accurate model of situations, through which specific tasks such as situation assessment, knowledge discovery , or the...significance” is always context specific. Event discovery is a very important element of the HLF process, which can lead to knowledge discovery about...expected, given the current state of knowledge . Examples of such behavior may include discovery of a new aggregate or situation, a specific pattern of
A concept for performance management for Federal science programs
Whalen, Kevin G.
2017-11-06
The demonstration of clear linkages between planning, funding, outcomes, and performance management has created unique challenges for U.S. Federal science programs. An approach is presented here that characterizes science program strategic objectives by one of five “activity types”: (1) knowledge discovery, (2) knowledge development and delivery, (3) science support, (4) inventory and monitoring, and (5) knowledge synthesis and assessment. The activity types relate to performance measurement tools for tracking outcomes of research funded under the objective. The result is a multi-time scale, integrated performance measure that tracks individual performance metrics synthetically while also measuring progress toward long-term outcomes. Tracking performance on individual metrics provides explicit linkages to root causes of potentially suboptimal performance and captures both internal and external program drivers, such as customer relations and science support for managers. Functionally connecting strategic planning objectives with performance measurement tools is a practical approach for publicly funded science agencies that links planning, outcomes, and performance management—an enterprise that has created unique challenges for public-sector research and development programs.
Margolis, Ronald; Derr, Leslie; Dunn, Michelle; Huerta, Michael; Larkin, Jennie; Sheehan, Jerry; Guyer, Mark; Green, Eric D
2014-01-01
Biomedical research has and will continue to generate large amounts of data (termed 'big data') in many formats and at all levels. Consequently, there is an increasing need to better understand and mine the data to further knowledge and foster new discovery. The National Institutes of Health (NIH) has initiated a Big Data to Knowledge (BD2K) initiative to maximize the use of biomedical big data. BD2K seeks to better define how to extract value from the data, both for the individual investigator and the overall research community, create the analytic tools needed to enhance utility of the data, provide the next generation of trained personnel, and develop data science concepts and tools that can be made available to all stakeholders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Granularity refined by knowledge: contingency tables and rough sets as tools of discovery
NASA Astrophysics Data System (ADS)
Zytkow, Jan M.
2000-04-01
Contingency tables represent data in a granular way and are a well-established tool for inductive generalization of knowledge from data. We show that the basic concepts of rough sets, such as concept approximation, indiscernibility, and reduct can be expressed in the language of contingency tables. We further demonstrate the relevance to rough sets theory of additional probabilistic information available in contingency tables and in particular of statistical tests of significance and predictive strength applied to contingency tables. Tests of both type can help the evaluation mechanisms used in inductive generalization based on rough sets. Granularity of attributes can be improved in feedback with knowledge discovered in data. We demonstrate how 49er's facilities for (1) contingency table refinement, for (2) column and row grouping based on correspondence analysis, and (3) the search for equivalence relations between attributes improve both granularization of attributes and the quality of knowledge. Finally we demonstrate the limitations of knowledge viewed as concept approximation, which is the focus of rough sets. Transcending that focus and reorienting towards the predictive knowledge and towards the related distinction between possible and impossible (or statistically improbable) situations will be very useful in expanding the rough sets approach to more expressive forms of knowledge.
TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.
Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana
2018-04-05
A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .
Streptomyces species: Ideal chassis for natural product discovery and overproduction.
Liu, Ran; Deng, Zixin; Liu, Tiangang
2018-05-28
There is considerable interest in mining organisms for new natural products (NPs) and in improving methods to overproduce valuable NPs. Because of the rapid development of tools and strategies for metabolic engineering and the markedly increased knowledge of the biosynthetic pathways and genetics of NP-producing organisms, genome mining and overproduction of NPs can be dramatically accelerated. In particular, Streptomyces species have been proposed as suitable chassis organisms for NP discovery and overproduction because of their many unique characteristics not shared with yeast, Escherichia coli, or other microorganisms. In this review, we summarize the methods for genome sequencing, gene cluster prediction, and gene editing in Streptomyces, as well as metabolic engineering strategies for NP overproduction and approaches for generating new products. Finally, two strategies for utilizing Streptomyces as the chassis for NP discovery and overproduction are emphasized. Copyright © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
A New System To Support Knowledge Discovery: Telemakus.
ERIC Educational Resources Information Center
Revere, Debra; Fuller, Sherrilynne S.; Bugni, Paul F.; Martin, George M.
2003-01-01
The Telemakus System builds on the areas of concept representation, schema theory, and information visualization to enhance knowledge discovery from scientific literature. This article describes the underlying theories and an overview of a working implementation designed to enhance the knowledge discovery process through retrieval, visual and…
NASA Astrophysics Data System (ADS)
Berkman, P. A.
2005-12-01
The World Data Center system emerged in 1957-58 with the International Geophysical Year (which was renamed from the 3rd International Polar Year) to preserve and provide access to scientific data collected from observational programs throughout the Earth system. Fast forward a half century ... access to diverse digital information has become effectively infinite and instantaneous with nearly 20,000 petabytes of information produced and stored on print, optical and magnetic media each year; microprocessor speeds that have increased 5 orders of magnitude since 1972; existence of the Internet; increasing global capacity to collect and transmit information via satellites; availability of powerful search engines; and proliferation of data warehouses like the World Data Centers. The problem is that we already have reached the threshold in our world information society when accessing more information does not equate with generating more knowledge. In 2007-08, the International Council of Science and World Meteorological Organization will convene the next International Polar Year to accelerate our understanding of how the polar regions respond to, amplify and drive changes elsewhere in the Earth system (http://www.ipy.org). Beyond Earth system science, strategies and tools for integrating digital information to discover meaningful relationships among the disparate data would have societal benefits from boardrooms to classrooms. In the same sense that human-launched satellites became a strategic focus that justified national investments in the International Geophysical Year, developing the next generation of knowledge discovery tools is an opportunity for the International Polar Year 2007-08 and its affiliated programs to contribute in an area that is critical to the future of our global community. Knowledge is the common wealth of humanity. H.E. Mr. Adama Samassekou President, World Summit on the Information Society
Knowledge Discovery as an Aid to Organizational Creativity.
ERIC Educational Resources Information Center
Siau, Keng
2000-01-01
This article presents the concept of knowledge discovery, a process of searching for associations in large volumes of computer data, as an aid to creativity. It then discusses the various techniques in knowledge discovery. Mednick's associative theory of creative thought serves as the theoretical foundation for this research. (Contains…
2017-06-27
From - To) 05-27-2017 Final 17-03-2017 - 15-03-2018 4. TITLE AND SUBTITLE Sa. CONTRACT NUMBER FA2386-17-1-0102 Advances in Knowledge Discovery and...Springer; Switzerland. 14. ABSTRACT The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is a leading international conference...in the areas of knowledge discovery and data mining (KDD). We had three keynote speeches, delivered by Sang Cha from Seoul National University
atBioNet--an integrated network analysis tool for genomics and biomarker discovery.
Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida
2012-07-20
Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
Distribution and licensing of drug discovery tools – NIH perspectives
Kim, J. P.
2009-01-01
Now, more than ever, drug discovery conducted at industrial or academic facilities requires rapid access to state-of-the-art research tools. Unreasonable restrictions or delays in the distribution or use of such tools can stifle new discoveries, thus limiting the development of future biomedical products. In grants and its own research programs the National Institutes of Health (NIH) is implementing its new policy to facilitate the exchanges of these tools for research discoveries and product development. PMID:12546842
NASA Astrophysics Data System (ADS)
Berres, A.; Karthik, R.; Nugent, P.; Sorokine, A.; Myers, A.; Pang, H.
2017-12-01
Building an integrated data infrastructure that can meet the needs of a sustainable energy-water resource management requires a robust data management and geovisual analytics platform, capable of cross-domain scientific discovery and knowledge generation. Such a platform can facilitate the investigation of diverse complex research and policy questions for emerging priorities in Energy-Water Nexus (EWN) science areas. Using advanced data analytics, machine learning techniques, multi-dimensional statistical tools, and interactive geovisualization components, such a multi-layered federated platform is being developed, the Energy-Water Nexus Knowledge Discovery Framework (EWN-KDF). This platform utilizes several enterprise-grade software design concepts and standards such as extensible service-oriented architecture, open standard protocols, event-driven programming model, enterprise service bus, and adaptive user interfaces to provide a strategic value to the integrative computational and data infrastructure. EWN-KDF is built on the Compute and Data Environment for Science (CADES) environment in Oak Ridge National Laboratory (ORNL).
An Integrative Bioinformatics Approach for Knowledge Discovery
NASA Astrophysics Data System (ADS)
Peña-Castillo, Lourdes; Phan, Sieu; Famili, Fazel
The vast amount of data being generated by large scale omics projects and the computational approaches developed to deal with this data have the potential to accelerate the advancement of our understanding of the molecular basis of genetic diseases. This better understanding may have profound clinical implications and transform the medical practice; for instance, therapeutic management could be prescribed based on the patient’s genetic profile instead of being based on aggregate data. Current efforts have established the feasibility and utility of integrating and analysing heterogeneous genomic data to identify molecular associations to pathogenesis. However, since these initiatives are data-centric, they either restrict the research community to specific data sets or to a certain application domain, or force researchers to develop their own analysis tools. To fully exploit the potential of omics technologies, robust computational approaches need to be developed and made available to the community. This research addresses such challenge and proposes an integrative approach to facilitate knowledge discovery from diverse datasets and contribute to the advancement of genomic medicine.
Nanotechnology applications in hematological malignancies (Review).
Samir, Ahmed; Elgamal, Basma M; Gabr, Hala; Sabaawy, Hatem E
2015-09-01
A major limitation to current cancer therapies is the development of therapy-related side-effects and dose limiting complications. Moreover, a better understanding of the biology of cancer cells and the mechanisms of resistance to therapy is rapidly developing. The translation of advanced knowledge and discoveries achieved at the molecular level must be supported by advanced diagnostic, therapeutic and delivery technologies to translate these discoveries into useful tools that are essential in achieving progress in the war against cancer. Nanotechnology can play an essential role in this aspect providing a transforming technology that can translate the basic and clinical findings into novel diagnostic, therapeutic and preventive tools useful in different types of cancer. Hematological malignancies represent a specific class of cancer, which attracts special attention in the applications of nanotechnology for cancer diagnosis and treatment. The aim of the present review is to elucidate the emerging applications of nanotechnology in cancer management and describe the potentials of nanotechnology in changing the key fundamental aspects of hematological malignancy diagnosis, treatment and follow-up.
Nanotechnology applications in hematological malignancies (Review)
SAMIR, AHMED; ELGAMAL, BASMA M; GABR, HALA; SABAAWY, HATEM E
2015-01-01
A major limitation to current cancer therapies is the development of therapy-related side-effects and dose limiting complications. Moreover, a better understanding of the biology of cancer cells and the mechanisms of resistance to therapy is rapidly developing. The translation of advanced knowledge and discoveries achieved at the molecular level must be supported by advanced diagnostic, therapeutic and delivery technologies to translate these discoveries into useful tools that are essential in achieving progress in the war against cancer. Nanotechnology can play an essential role in this aspect providing a transforming technology that can translate the basic and clinical findings into novel diagnostic, therapeutic and preventive tools useful in different types of cancer. Hematological malignancies represent a specific class of cancer, which attracts special attention in the applications of nanotechnology for cancer diagnosis and treatment. The aim of the present review is to elucidate the emerging applications of nanotechnology in cancer management and describe the potentials of nanotechnology in changing the key fundamental aspects of hematological malignancy diagnosis, treatment and follow-up. PMID:26134389
Generation of transgenic mouse model using PTTG as an oncogene.
Kakar, Sham S; Kakar, Cohin
2015-01-01
The close physiological similarity between the mouse and human has provided tools to understanding the biological function of particular genes in vivo by introduction or deletion of a gene of interest. Using a mouse as a model has provided a wealth of resources, knowledge, and technology, helping scientists to understand the biological functions, translocation, trafficking, and interaction of a candidate gene with other intracellular molecules, transcriptional regulation, posttranslational modification, and discovery of novel signaling pathways for a particular gene. Most importantly, the generation of the mouse model for a specific human disease has provided a powerful tool to understand the etiology of a disease and discovery of novel therapeutics. This chapter describes in detail the step-by-step generation of the transgenic mouse model, which can be helpful in guiding new investigators in developing successful models. For practical purposes, we will describe the generation of a mouse model using pituitary tumor transforming gene (PTTG) as the candidate gene of interest.
Brancaccio, Rosario N; Robitaille, Alexis; Dutta, Sankhadeep; Cuenin, Cyrille; Santare, Daiga; Skenders, Girts; Leja, Marcis; Fischer, Nicole; Giuliano, Anna R; Rollison, Dana E; Grundhoff, Adam; Tommasino, Massimo; Gheit, Tarik
2018-05-07
With the advent of new molecular tools, the discovery of new papillomaviruses (PVs) has accelerated during the past decade, enabling the expansion of knowledge about the viral populations that inhabit the human body. Human PVs (HPVs) are etiologically linked to benign or malignant lesions of the skin and mucosa. The detection of HPV types can vary widely, depending mainly on the methodology and the quality of the biological sample. Next-generation sequencing is one of the most powerful tools, enabling the discovery of novel viruses in a wide range of biological material. Here, we report a novel protocol for the detection of known and unknown HPV types in human skin and oral gargle samples using improved PCR protocols combined with next-generation sequencing. We identified 105 putative new PV types in addition to 296 known types, thus providing important information about the viral distribution in the oral cavity and skin. Copyright © 2018. Published by Elsevier Inc.
The Relation between Prior Knowledge and Students' Collaborative Discovery Learning Processes
ERIC Educational Resources Information Center
Gijlers, Hannie; de Jong, Ton
2005-01-01
In this study we investigate how prior knowledge influences knowledge development during collaborative discovery learning. Fifteen dyads of students (pre-university education, 15-16 years old) worked on a discovery learning task in the physics field of kinematics. The (face-to-face) communication between students was recorded and the interaction…
Enhancing knowledge discovery from cancer genomics data with Galaxy
Albuquerque, Marco A.; Grande, Bruno M.; Ritch, Elie J.; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K.; Shah, Sohrab P.; Boutros, Paul C.
2017-01-01
Abstract The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. PMID:28327945
Enhancing knowledge discovery from cancer genomics data with Galaxy.
Albuquerque, Marco A; Grande, Bruno M; Ritch, Elie J; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K; Shah, Sohrab P; Boutros, Paul C; Morin, Ryan D
2017-05-01
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. © The Author 2017. Published by Oxford University Press.
Eyal-Altman, Noah; Last, Mark; Rubin, Eitan
2017-01-17
Numerous publications attempt to predict cancer survival outcome from gene expression data using machine-learning methods. A direct comparison of these works is challenging for the following reasons: (1) inconsistent measures used to evaluate the performance of different models, and (2) incomplete specification of critical stages in the process of knowledge discovery. There is a need for a platform that would allow researchers to replicate previous works and to test the impact of changes in the knowledge discovery process on the accuracy of the induced models. We developed the PCM-SABRE platform, which supports the entire knowledge discovery process for cancer outcome analysis. PCM-SABRE was developed using KNIME. By using PCM-SABRE to reproduce the results of previously published works on breast cancer survival, we define a baseline for evaluating future attempts to predict cancer outcome with machine learning. We used PCM-SABRE to replicate previous work that describe predictive models of breast cancer recurrence, and tested the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works. We reconstructed the work of Chou et al. observing similar trends - superior performance of Probabilistic Neural Network (PNN) and logistic regression (LR) algorithms and inconclusive impact of feature pre-selection with the decision tree algorithm on subsequent analysis. PCM-SABRE is a software tool that provides an intuitive environment for rapid development of predictive models in cancer precision medicine.
The modern search for the Holy Grail: is neuroscience a solution?
Naor, Navot; Ben-Ze'ev, Aaron; Okon-Singer, Hadas
2014-01-01
Neuroscience has become prevalent in recent years; nevertheless, its value in the examination of psychological and philosophical phenomena is still a matter of debate. The examples reviewed here suggest that neuroscientific tools can be significant in the investigation of such complex phenomena. In this article, we argue that it is important to study concepts that do not have a clear characterization and emphasize the role of neuroscience in this quest for knowledge. The data reviewed here suggest that neuroscience may (1) enrich our knowledge; (2) outline the nature of an explanation; and (3) lead to substantial empirical and theoretical discoveries. To that end, we review work on hedonia and eudaimonia in the fields of neuroscience, psychology, and philosophy. These studies demonstrate the importance of neuroscientific tools in the investigation of phenomena that are difficult to define using other methods. PMID:24926246
Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhary, Alok
Computational scientists must understand results from experimental, observational and computational simulation generated data to gain insights and perform knowledge discovery. As systems approach the petascale range, problems that were unimaginable a few years ago are within reach. With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive I/O, storage, acceleration of data manipulation, analysis, and mining tools. Scientists require techniques, tools and infrastructure to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis, statistical analysis and knowledgemore » discovery. The goal of this work is to enable more effective analysis of scientific datasets through the integration of enhancements in the I/O stack, from active storage support at the file system layer to MPI-IO and high-level I/O library layers. We propose to provide software components to accelerate data analytics, mining, I/O, and knowledge discovery for large-scale scientific applications, thereby increasing productivity of both scientists and the systems. Our approaches include 1) design the interfaces in high-level I/O libraries, such as parallel netCDF, for applications to activate data mining operations at the lower I/O layers; 2) Enhance MPI-IO runtime systems to incorporate the functionality developed as a part of the runtime system design; 3) Develop parallel data mining programs as part of runtime library for server-side file system in PVFS file system; and 4) Prototype an active storage cluster, which will utilize multicore CPUs, GPUs, and FPGAs to carry out the data mining workload.« less
DesAutels, Spencer J.; Fox, Zachary E.; Giuse, Dario A.; Williams, Annette M.; Kou, Qing-hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems. PMID:28269846
SemaTyP: a knowledge graph based literature mining method for drug discovery.
Sang, Shengtian; Yang, Zhihao; Wang, Lei; Liu, Xiaoxia; Lin, Hongfei; Wang, Jian
2018-05-30
Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.
PopED lite: An optimal design software for preclinical pharmacokinetic and pharmacodynamic studies.
Aoki, Yasunori; Sundqvist, Monika; Hooker, Andrew C; Gennemark, Peter
2016-04-01
Optimal experimental design approaches are seldom used in preclinical drug discovery. The objective is to develop an optimal design software tool specifically designed for preclinical applications in order to increase the efficiency of drug discovery in vivo studies. Several realistic experimental design case studies were collected and many preclinical experimental teams were consulted to determine the design goal of the software tool. The tool obtains an optimized experimental design by solving a constrained optimization problem, where each experimental design is evaluated using some function of the Fisher Information Matrix. The software was implemented in C++ using the Qt framework to assure a responsive user-software interaction through a rich graphical user interface, and at the same time, achieving the desired computational speed. In addition, a discrete global optimization algorithm was developed and implemented. The software design goals were simplicity, speed and intuition. Based on these design goals, we have developed the publicly available software PopED lite (http://www.bluetree.me/PopED_lite). Optimization computation was on average, over 14 test problems, 30 times faster in PopED lite compared to an already existing optimal design software tool. PopED lite is now used in real drug discovery projects and a few of these case studies are presented in this paper. PopED lite is designed to be simple, fast and intuitive. Simple, to give many users access to basic optimal design calculations. Fast, to fit a short design-execution cycle and allow interactive experimental design (test one design, discuss proposed design, test another design, etc). Intuitive, so that the input to and output from the software tool can easily be understood by users without knowledge of the theory of optimal design. In this way, PopED lite is highly useful in practice and complements existing tools. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Semantic biomedical resource discovery: a Natural Language Processing framework.
Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis
2015-09-30
A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.
A New Student Performance Analysing System Using Knowledge Discovery in Higher Educational Databases
ERIC Educational Resources Information Center
Guruler, Huseyin; Istanbullu, Ayhan; Karahasan, Mehmet
2010-01-01
Knowledge discovery is a wide ranged process including data mining, which is used to find out meaningful and useful patterns in large amounts of data. In order to explore the factors having impact on the success of university students, knowledge discovery software, called MUSKUP, has been developed and tested on student data. In this system a…
Knowledge Discovery in Databases.
ERIC Educational Resources Information Center
Norton, M. Jay
1999-01-01
Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…
Agrafiotis, Dimitris K; Alex, Simson; Dai, Heng; Derkinderen, An; Farnum, Michael; Gates, Peter; Izrailev, Sergei; Jaeger, Edward P; Konstant, Paul; Leung, Albert; Lobanov, Victor S; Marichal, Patrick; Martin, Douglas; Rassokhin, Dmitrii N; Shemanarev, Maxim; Skalkin, Andrew; Stong, John; Tabruyn, Tom; Vermeiren, Marleen; Wan, Jackson; Xu, Xiang Yang; Yao, Xiang
2007-01-01
We present ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems, and cultures using modern information technology and to provide scientists with tools that allow them to analyze multifactorial SAR and make informed, data-driven decisions. The system consists of three major components: (1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance; (2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting, and (3) a workspace, which facilitates collaboration and data sharing by allowing users to share queries, templates, results, and reports across project teams, campuses, and other organizational units. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD is used routinely by more than 1000 scientists around the world and is rapidly expanding into other functional areas within the J&J organization.
A Knowledge Discovery framework for Planetary Defense
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C. P.; Li, Y.; Yu, M.; Bambacus, M.; Seery, B.; Barbee, B.
2016-12-01
Planetary Defense, a project funded by NASA Goddard and the NSF, is a multi-faceted effort focused on the mitigation of Near Earth Object (NEO) threats to our planet. Currently, there exists a dispersion of information concerning NEO's amongst different organizations and scientists, leading to a lack of a coherent system of information to be used for efficient NEO mitigation. In this paper, a planetary defense knowledge discovery engine is proposed to better assist the development and integration of a NEO responding system. Specifically, we have implemented an organized information framework by two means: 1) the development of a semantic knowledge base, which provides a structure for relevant information. It has been developed by the implementation of web crawling and natural language processing techniques, which allows us to collect and store the most relevant structured information on a regular basis. 2) the development of a knowledge discovery engine, which allows for the efficient retrieval of information from our knowledge base. The knowledge discovery engine has been built on the top of Elasticsearch, an open source full-text search engine, as well as cutting-edge machine learning ranking and recommendation algorithms. This proposed framework is expected to advance the knowledge discovery and innovation in planetary science domain.
Translational Research 2.0: a framework for accelerating collaborative discovery.
Asakiewicz, Chris
2014-05-01
The world wide web has revolutionized the conduct of global, cross-disciplinary research. In the life sciences, interdisciplinary approaches to problem solving and collaboration are becoming increasingly important in facilitating knowledge discovery and integration. Web 2.0 technologies promise to have a profound impact - enabling reproducibility, aiding in discovery, and accelerating and transforming medical and healthcare research across the healthcare ecosystem. However, knowledge integration and discovery require a consistent foundation upon which to operate. A foundation should be capable of addressing some of the critical issues associated with how research is conducted within the ecosystem today and how it should be conducted for the future. This article will discuss a framework for enhancing collaborative knowledge discovery across the medical and healthcare research ecosystem. A framework that could serve as a foundation upon which ecosystem stakeholders can enhance the way data, information and knowledge is created, shared and used to accelerate the translation of knowledge from one area of the ecosystem to another.
ERIC Educational Resources Information Center
Weeber, Marc; Klein, Henny; de Jong-van den Berg, Lolkje T. W.; Vos, Rein
2001-01-01
Proposes a two-step model of discovery in which new scientific hypotheses can be generated and subsequently tested. Applying advanced natural language processing techniques to find biomedical concepts in text, the model is implemented in a versatile interactive discovery support tool. This tool is used to successfully simulate Don R. Swanson's…
NASA Astrophysics Data System (ADS)
Narock, T.; Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Finin, T.; Hitzler, P.; Krisnadhi, A.; Raymond, L. M.; Shepherd, A.; Wiebe, P. H.
2014-12-01
A wide spectrum of maturing methods and tools, collectively characterized as the Semantic Web, is helping to vastly improve the dissemination of scientific research. Creating semantic integration requires input from both domain and cyberinfrastructure scientists. OceanLink, an NSF EarthCube Building Block, is demonstrating semantic technologies through the integration of geoscience data repositories, library holdings, conference abstracts, and funded research awards. Meeting project objectives involves applying semantic technologies to support data representation, discovery, sharing and integration. Our semantic cyberinfrastructure components include ontology design patterns, Linked Data collections, semantic provenance, and associated services to enhance data and knowledge discovery, interoperation, and integration. We discuss how these components are integrated, the continued automated and semi-automated creation of semantic metadata, and techniques we have developed to integrate ontologies, link resources, and preserve provenance and attribution.
Citation Discovery Tools for Conducting Adaptive Meta-analyses to Update Systematic Reviews.
Bae, Jong-Myon; Kim, Eun Hee
2016-03-01
The systematic review (SR) is a research methodology that aims to synthesize related evidence. Updating previously conducted SRs is necessary when new evidence has been produced, but no consensus has yet emerged on the appropriate update methodology. The authors have developed a new SR update method called 'adaptive meta-analysis' (AMA) using the 'cited by', 'similar articles', and 'related articles' citation discovery tools in the PubMed and Scopus databases. This study evaluates the usefulness of these citation discovery tools for updating SRs. Lists were constructed by applying the citation discovery tools in the two databases to the articles analyzed by a published SR. The degree of overlap between the lists and distribution of excluded results were evaluated. The articles ultimately selected for the SR update meta-analysis were found in the lists obtained from the 'cited by' and 'similar' tools in PubMed. Most of the selected articles appeared in both the 'cited by' lists in Scopus and PubMed. The Scopus 'related' tool did not identify the appropriate articles. The AMA, which involves using both citation discovery tools in PubMed, and optionally, the 'related' tool in Scopus, was found to be useful for updating an SR.
García-Peñalvo, Francisco J.; Pérez-Blanco, Jonás Samuel; Martín-Suárez, Ana
2014-01-01
This paper discusses how cloud-based architectures can extend and enhance the functionality of the training environments based on virtual worlds and how, from this cloud perspective, we can provide support to analysis of training processes in the area of health, specifically in the field of training processes in quality assurance for pharmaceutical laboratories, presenting a tool for data retrieval and analysis that allows facing the knowledge discovery in the happenings inside the virtual worlds. PMID:24778593
Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E
2018-06-05
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Wilkins, JJ; Chan, PLS; Chard, J; Smith, G; Smith, MK; Beer, M; Dunn, A; Flandorfer, C; Franklin, C; Gomeni, R; Harnisch, L; Kaye, R; Moodie, S; Sardu, ML; Wang, E; Watson, E; Wolstencroft, K
2017-01-01
Pharmacometric analyses are complex and multifactorial. It is essential to check, track, and document the vast amounts of data and metadata that are generated during these analyses (and the relationships between them) in order to comply with regulations, support quality control, auditing, and reporting. It is, however, challenging, tedious, error‐prone, and time‐consuming, and diverts pharmacometricians from the more useful business of doing science. Automating this process would save time, reduce transcriptional errors, support the retention and transfer of knowledge, encourage good practice, and help ensure that pharmacometric analyses appropriately impact decisions. The ability to document, communicate, and reconstruct a complete pharmacometric analysis using an open standard would have considerable benefits. In this article, the Innovative Medicines Initiative (IMI) Drug Disease Model Resources (DDMoRe) consortium proposes a set of standards to facilitate the capture, storage, and reporting of knowledge (including assumptions and decisions) in the context of model‐informed drug discovery and development (MID3), as well as to support reproducibility: “Thoughtflow.” A prototype software implementation is provided. PMID:28504472
Knowledge Discovery from Biomedical Ontologies in Cross Domains.
Shen, Feichen; Lee, Yugyung
2016-01-01
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.
Knowledge Discovery from Biomedical Ontologies in Cross Domains
Shen, Feichen; Lee, Yugyung
2016-01-01
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies. PMID:27548262
Knowledge discovery with classification rules in a cardiovascular dataset.
Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan
2005-12-01
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.
Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.
2015-01-01
Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Progress in Biomedical Knowledge Discovery: A 25-year Retrospective
Sacchi, L.
2016-01-01
Summary Objectives We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. Methods We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. Results A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Conclusions Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data. PMID:27488403
Progress in Biomedical Knowledge Discovery: A 25-year Retrospective.
Sacchi, L; Holmes, J H
2016-08-02
We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992- 2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.
Communication in Collaborative Discovery Learning
ERIC Educational Resources Information Center
Saab, Nadira; van Joolingen, Wouter R.; van Hout-Wolters, Bernadette H. A. M.
2005-01-01
Background: Constructivist approaches to learning focus on learning environments in which students have the opportunity to construct knowledge themselves, and negotiate this knowledge with others. "Discovery learning" and "collaborative learning" are examples of learning contexts that cater for knowledge construction processes. We introduce a…
Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework
Lucero, Robert J.; Bakken, Suzanne
2014-01-01
Electronic health information systems can increase the ability of health-care organizations to investigate the effects of clinical interventions. The authors present an organizing framework that integrates outcomes and informatics research paradigms to guide knowledge discovery in electronic clinical databases. They illustrate its application using the example of hospital acquired pressure ulcers (HAPU). The Knowledge Discovery through Informatics for Comparative Effectiveness Research (KDI-CER) framework was conceived as a heuristic to conceptualize study designs and address potential methodological limitations imposed by using a single research perspective. Advances in informatics research can play a complementary role in advancing the field of outcomes research including CER. The KDI-CER framework can be used to facilitate knowledge discovery from routinely collected electronic clinical data. PMID:25278645
The emergence of translational epidemiology: from scientific discovery to population health impact.
Khoury, Muin J; Gwinn, Marta; Ioannidis, John P A
2010-09-01
Recent emphasis on translational research (TR) is highlighting the role of epidemiology in translating scientific discoveries into population health impact. The authors present applications of epidemiology in TR through 4 phases designated T1-T4, illustrated by examples from human genomics. In T1, epidemiology explores the role of a basic scientific discovery (e.g., a disease risk factor or biomarker) in developing a "candidate application" for use in practice (e.g., a test used to guide interventions). In T2, epidemiology can help to evaluate the efficacy of a candidate application by using observational studies and randomized controlled trials. In T3, epidemiology can help to assess facilitators and barriers for uptake and implementation of candidate applications in practice. In T4, epidemiology can help to assess the impact of using candidate applications on population health outcomes. Epidemiology also has a leading role in knowledge synthesis, especially using quantitative methods (e.g., meta-analysis). To explore the emergence of TR in epidemiology, the authors compared articles published in selected issues of the Journal in 1999 and 2009. The proportion of articles identified as translational doubled from 16% (11/69) in 1999 to 33% (22/66) in 2009 (P = 0.02). Epidemiology is increasingly recognized as an important component of TR. By quantifying and integrating knowledge across disciplines, epidemiology provides crucial methods and tools for TR.
The Emergence of Translational Epidemiology: From Scientific Discovery to Population Health Impact
Khoury, Muin J.; Gwinn, Marta; Ioannidis, John P. A.
2010-01-01
Recent emphasis on translational research (TR) is highlighting the role of epidemiology in translating scientific discoveries into population health impact. The authors present applications of epidemiology in TR through 4 phases designated T1–T4, illustrated by examples from human genomics. In T1, epidemiology explores the role of a basic scientific discovery (e.g., a disease risk factor or biomarker) in developing a “candidate application” for use in practice (e.g., a test used to guide interventions). In T2, epidemiology can help to evaluate the efficacy of a candidate application by using observational studies and randomized controlled trials. In T3, epidemiology can help to assess facilitators and barriers for uptake and implementation of candidate applications in practice. In T4, epidemiology can help to assess the impact of using candidate applications on population health outcomes. Epidemiology also has a leading role in knowledge synthesis, especially using quantitative methods (e.g., meta-analysis). To explore the emergence of TR in epidemiology, the authors compared articles published in selected issues of the Journal in 1999 and 2009. The proportion of articles identified as translational doubled from 16% (11/69) in 1999 to 33% (22/66) in 2009 (P = 0.02). Epidemiology is increasingly recognized as an important component of TR. By quantifying and integrating knowledge across disciplines, epidemiology provides crucial methods and tools for TR. PMID:20688899
Integration of cardiac proteome biology and medicine by a specialized knowledgebase.
Zong, Nobel C; Li, Haomin; Li, Hua; Lam, Maggie P Y; Jimenez, Rafael C; Kim, Christina S; Deng, Ning; Kim, Allen K; Choi, Jeong Ho; Zelaya, Ivette; Liem, David; Meyer, David; Odeberg, Jacob; Fang, Caiyun; Lu, Hao-Jie; Xu, Tao; Weiss, James; Duan, Huilong; Uhlen, Mathias; Yates, John R; Apweiler, Rolf; Ge, Junbo; Hermjakob, Henning; Ping, Peipei
2013-10-12
Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts. The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine. We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest. COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.
Simulating the drug discovery pipeline: a Monte Carlo approach
2012-01-01
Background The early drug discovery phase in pharmaceutical research and development marks the beginning of a long, complex and costly process of bringing a new molecular entity to market. As such, it plays a critical role in helping to maintain a robust downstream clinical development pipeline. Despite its importance, however, to our knowledge there are no published in silico models to simulate the progression of discrete virtual projects through a discovery milestone system. Results Multiple variables were tested and their impact on productivity metrics examined. Simulations predict that there is an optimum number of scientists for a given drug discovery portfolio, beyond which output in the form of preclinical candidates per year will remain flat. The model further predicts that the frequency of compounds to successfully pass the candidate selection milestone as a function of time will be irregular, with projects entering preclinical development in clusters marked by periods of low apparent productivity. Conclusions The model may be useful as a tool to facilitate analysis of historical growth and achievement over time, help gauge current working group progress against future performance expectations, and provide the basis for dialogue regarding working group best practices and resource deployment strategies. PMID:23186040
Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery
Hoinka, Jan; Berezhnoy, Alexey; Dao, Phuong; Sauna, Zuben E.; Gilboa, Eli; Przytycka, Teresa M.
2015-01-01
High-Throughput (HT) SELEX combines SELEX (Systematic Evolution of Ligands by EXponential Enrichment), a method for aptamer discovery, with massively parallel sequencing technologies. This emerging technology provides data for a global analysis of the selection process and for simultaneous discovery of a large number of candidates but currently lacks dedicated computational approaches for their analysis. To close this gap, we developed novel in-silico methods to analyze HT-SELEX data and utilized them to study the emergence of polymerase errors during HT-SELEX. Rather than considering these errors as a nuisance, we demonstrated their utility for guiding aptamer discovery. Our approach builds on two main advancements in aptamer analysis: AptaMut—a novel technique allowing for the identification of polymerase errors conferring an improved binding affinity relative to the ‘parent’ sequence and AptaCluster—an aptamer clustering algorithm which is to our best knowledge, the only currently available tool capable of efficiently clustering entire aptamer pools. We applied these methods to an HT-SELEX experiment developing aptamers against Interleukin 10 receptor alpha chain (IL-10RA) and experimentally confirmed our predictions thus validating our computational methods. PMID:25870409
1994-09-30
relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.
Taking Open Innovation to the Molecular Level - Strengths and Limitations.
Zdrazil, Barbara; Blomberg, Niklas; Ecker, Gerhard F
2012-08-01
The ever-growing availability of large-scale open data and its maturation is having a significant impact on industrial drug-discovery, as well as on academic and non-profit research. As industry is changing to an 'open innovation' business concept, precompetitive initiatives and strong public-private partnerships including academic research cooperation partners are gaining more and more importance. Now, the bioinformatics and cheminformatics communities are seeking for web tools which allow the integration of this large volume of life science datasets available in the public domain. Such a data exploitation tool would ideally be able to answer complex biological questions by formulating only one search query. In this short review/perspective, we outline the use of semantic web approaches for data and knowledge integration. Further, we discuss strengths and current limitations of public available data retrieval tools and integrated platforms.
Learning motion concepts using real-time microcomputer-based laboratory tools
NASA Astrophysics Data System (ADS)
Thornton, Ronald K.; Sokoloff, David R.
1990-09-01
Microcomputer-based laboratory (MBL) tools have been developed which interface to Apple II and Macintosh computers. Students use these tools to collect physical data that are graphed in real time and then can be manipulated and analyzed. The MBL tools have made possible discovery-based laboratory curricula that embody results from educational research. These curricula allow students to take an active role in their learning and encourage them to construct physical knowledge from observation of the physical world. The curricula encourage collaborative learning by taking advantage of the fact that MBL tools present data in an immediately understandable graphical form. This article describes one of the tools—the motion detector (hardware and software)—and the kinematics curriculum. The effectiveness of this curriculum compared to traditional college and university methods for helping students learn basic kinematics concepts has been evaluated by pre- and post-testing and by observation. There is strong evidence for significantly improved learning and retention by students who used the MBL materials, compared to those taught in lecture.
Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C
2016-12-13
Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
QA-driven Guidelines Generation for Bacteriotherapy
Pasche, Emilie; Teodoro, Douglas; Gobeill, Julien; Ruch, Patrick; Lovis, Christian
2009-01-01
PURPOSE We propose a question-answering (QA) driven generation approach for automatic acquisition of structured rules that can be used in a knowledge authoring tool for antibiotic prescription guidelines management. METHODS: The rule generation is seen as a question-answering problem, where the parameters of the questions are known items of the rule (e.g. an infectious disease, caused by a given bacterium) and answers (e.g. some antibiotics) are obtained by a question-answering engine. RESULTS: When looking for a drug given a pathogen and a disease, top-precision of 0.55 is obtained by the combination of the Boolean engine (PubMed) and the relevance-driven engine (easyIR), which means that for more than half of our evaluation benchmark at least one of the recommended antibiotics was automatically acquired by the rule generation method. CONCLUSION: These results suggest that such an automatic text mining approach could provide a useful tool for guidelines management, by improving knowledge update and discovery. PMID:20351908
Girardi, Dominic; Küng, Josef; Kleiser, Raimund; Sonnberger, Michael; Csillag, Doris; Trenkler, Johannes; Holzinger, Andreas
2016-09-01
Established process models for knowledge discovery find the domain-expert in a customer-like and supervising role. In the field of biomedical research, it is necessary to move the domain-experts into the center of this process with far-reaching consequences for both their research output and the process itself. In this paper, we revise the established process models for knowledge discovery and propose a new process model for domain-expert-driven interactive knowledge discovery. Furthermore, we present a research infrastructure which is adapted to this new process model and demonstrate how the domain-expert can be deeply integrated even into the highly complex data-mining process and data-exploration tasks. We evaluated this approach in the medical domain for the case of cerebral aneurysms research.
Visualising nursing data using correspondence analysis.
Kokol, Peter; Blažun Vošner, Helena; Železnik, Danica
2016-09-01
Digitally stored, large healthcare datasets enable nurses to use 'big data' techniques and tools in nursing research. Big data is complex and multi-dimensional, so visualisation may be a preferable approach to analyse and understand it. To demonstrate the use of visualisation of big data in a technique called correspondence analysis. In the authors' study, relations among data in a nursing dataset were shown visually in graphs using correspondence analysis. The case presented demonstrates that correspondence analysis is easy to use, shows relations between data visually in a form that is simple to interpret, and can reveal hidden associations between data. Correspondence analysis supports the discovery of new knowledge. Implications for practice Knowledge obtained using correspondence analysis can be transferred immediately into practice or used to foster further research.
Knowledge Discovery in Textual Documentation: Qualitative and Quantitative Analyses.
ERIC Educational Resources Information Center
Loh, Stanley; De Oliveira, Jose Palazzo M.; Gastal, Fabio Leite
2001-01-01
Presents an application of knowledge discovery in texts (KDT) concerning medical records of a psychiatric hospital. The approach helps physicians to extract knowledge about patients and diseases that may be used for epidemiological studies, for training professionals, and to support physicians to diagnose and evaluate diseases. (Author/AEF)
Knowledge discovery from data as a framework to decision support in medical domains
Gibert, Karina
2009-01-01
Introduction Knowledge discovery from data (KDD) is a multidisciplinary discipline which appeared in 1996 for “non trivial identifying of valid, novel, potentially useful, ultimately understandable patterns in data”. Pre-treatment of data and post-processing is as important as the data exploitation (Data Mining) itself. Different analysis techniques can be properly combined to produce explicit knowledge from data. Methods Hybrid KDD methodologies combining Artificial Intelligence with Statistics and visualization have been used to identify patterns in complex medical phenomena: experts provide prior knowledge (pK); it biases the search of distinguishable groups of homogeneous objects; support-interpretation tools (CPG) assisted experts in conceptualization and labelling of discovered patterns, consistently with pK. Results Patterns of dependency in mental disabilities supported decision-making on legislation of the Spanish Dependency Law in Catalonia. Relationships between type of neurorehabilitation treatment and patterns of response for brain damage are assessed. Patterns of the perceived QOL along time are used in spinal cord lesion to improve social inclusion. Conclusion Reality is more and more complex and classical data analyses are not powerful enough to model it. New methodologies are required including multidisciplinarity and stressing on production of understandable models. Interaction with the experts is critical to generate meaningful results which can really support decision-making, particularly convenient transferring the pK to the system, as well as interpreting results in close interaction with experts. KDD is a valuable paradigm, particularly when facing very complex domains, not well understood yet, like many medical phenomena.
Place of International Congresses in the Diffusion of Knowledge in Infectious Diseases.
Lassmann, Britta; Cornaglia, Giuseppe
2017-08-15
Through digital resources, physicians, microbiologists, and researchers around the world can stay up-to-date with the newest developments in their field and are therefore less dependent on medical congresses as a provider of knowledge and education. The role of the medical congress in spreading knowledge in the face of this changing environment needs to be reexamined. The result is a new paradigm that thinks about the dissemination of medical knowledge and discovery as ongoing conversations between professionals and their extended networks, rather than activities that happen only during the congress. Even though the tools we use to deliver information and knowledge are rapidly evolving, there is confidence in the lasting value of meetings for medical professionals. Medical congresses are environments uniquely conducive to generating new ideas and solutions to problems. As organizers explore new ways of sharing knowledge globally, it is crucial that the high quality of medical congresses be maintained. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
Current Advances on Virus Discovery and Diagnostic Role of Viral Metagenomics in Aquatic Organisms
Munang'andu, Hetron M.; Mugimba, Kizito K.; Byarugaba, Denis K.; Mutoloki, Stephen; Evensen, Øystein
2017-01-01
The global expansion of the aquaculture industry has brought with it a corresponding increase of novel viruses infecting different aquatic organisms. These emerging viral pathogens have proved to be a challenge to the use of traditional cell-cultures and immunoassays for identification of new viruses especially in situations where the novel viruses are unculturable and no antibodies exist for their identification. Viral metagenomics has the potential to identify novel viruses without prior knowledge of their genomic sequence data and may provide a solution for the study of unculturable viruses. This review provides a synopsis on the contribution of viral metagenomics to the discovery of viruses infecting different aquatic organisms as well as its potential role in viral diagnostics. High throughput Next Generation sequencing (NGS) and library construction used in metagenomic projects have simplified the task of generating complete viral genomes unlike the challenge faced in traditional methods that use multiple primers targeted at different segments and VPs to generate the entire genome of a novel virus. In terms of diagnostics, studies carried out this far show that viral metagenomics has the potential to serve as a multifaceted tool able to study and identify etiological agents of single infections, co-infections, tissue tropism, profiling viral infections of different aquatic organisms, epidemiological monitoring of disease prevalence, evolutionary phylogenetic analyses, and the study of genomic diversity in quasispecies viruses. With sequencing technologies and bioinformatics analytical tools becoming cheaper and easier, we anticipate that metagenomics will soon become a routine tool for the discovery, study, and identification of novel pathogens including viruses to enable timely disease control for emerging diseases in aquaculture. PMID:28382024
Automated discovery systems and the inductivist controversy
NASA Astrophysics Data System (ADS)
Giza, Piotr
2017-09-01
The paper explores possible influences that some developments in the field of branches of AI, called automated discovery and machine learning systems, might have upon some aspects of the old debate between Francis Bacon's inductivism and Karl Popper's falsificationism. Donald Gillies facetiously calls this controversy 'the duel of two English knights', and claims, after some analysis of historical cases of discovery, that Baconian induction had been used in science very rarely, or not at all, although he argues that the situation has changed with the advent of machine learning systems. (Some clarification of terms machine learning and automated discovery is required here. The key idea of machine learning is that, given data with associated outcomes, software can be trained to make those associations in future cases which typically amounts to inducing some rules from individual cases classified by the experts. Automated discovery (also called machine discovery) deals with uncovering new knowledge that is valuable for human beings, and its key idea is that discovery is like other intellectual tasks and that the general idea of heuristic search in problem spaces applies also to discovery tasks. However, since machine learning systems discover (very low-level) regularities in data, throughout this paper I use the generic term automated discovery for both kinds of systems. I will elaborate on this later on). Gillies's line of argument can be generalised: thanks to automated discovery systems, philosophers of science have at their disposal a new tool for empirically testing their philosophical hypotheses. Accordingly, in the paper, I will address the question, which of the two philosophical conceptions of scientific method is better vindicated in view of the successes and failures of systems developed within three major research programmes in the field: machine learning systems in the Turing tradition, normative theory of scientific discovery formulated by Herbert Simon's group and the programme called HHNT, proposed by J. Holland, K. Holyoak, R. Nisbett and P. Thagard.
Deng, Michelle; Zollanvari, Amin; Alterovitz, Gil
2012-01-01
The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts-rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.
Deng, Michelle; Zollanvari, Amin; Alterovitz, Gil
2012-01-01
The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts—rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well. PMID:22779044
Building Faculty Capacity through the Learning Sciences
ERIC Educational Resources Information Center
Moy, Elizabeth; O'Sullivan, Gerard; Terlecki, Melissa; Jernstedt, Christian
2014-01-01
Discoveries in the learning sciences (especially in neuroscience) have yielded a rich and growing body of knowledge about how students learn, yet this knowledge is only half of the story. The other half is "know how," i.e. the application of this knowledge. For faculty members, that means applying the discoveries of the learning sciences…
Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Rosenthal, Nadia A; Kitano, Hiroaki; Boyd, Sarah E
2015-05-01
Existing de novo software platforms have largely overlooked a valuable resource, the expertise of the intended biologist users. Typical data representations such as long gene lists, or highly dense and overlapping transcription factor networks often hinder biologists from relating these results to their expertise. VISIONET, a streamlined visualisation tool built from experimental needs, enables biologists to transform large and dense overlapping transcription factor networks into sparse human-readable graphs via numerically filtering. The VISIONET interface allows users without a computing background to interactively explore and filter their data, and empowers them to apply their specialist knowledge on far more complex and substantial data sets than is currently possible. Applying VISIONET to the Tbx20-Gata4 transcription factor network led to the discovery and validation of Aldh1a2, an essential developmental gene associated with various important cardiac disorders, as a healthy adult cardiac fibroblast gene co-regulated by cardiogenic transcription factors Gata4 and Tbx20. We demonstrate with experimental validations the utility of VISIONET for expertise-driven gene discovery that opens new experimental directions that would not otherwise have been identified.
Recent development in software and automation tools for high-throughput discovery bioanalysis.
Shou, Wilson Z; Zhang, Jun
2012-05-01
Bioanalysis with LC-MS/MS has been established as the method of choice for quantitative determination of drug candidates in biological matrices in drug discovery and development. The LC-MS/MS bioanalytical support for drug discovery, especially for early discovery, often requires high-throughput (HT) analysis of large numbers of samples (hundreds to thousands per day) generated from many structurally diverse compounds (tens to hundreds per day) with a very quick turnaround time, in order to provide important activity and liability data to move discovery projects forward. Another important consideration for discovery bioanalysis is its fit-for-purpose quality requirement depending on the particular experiments being conducted at this stage, and it is usually not as stringent as those required in bioanalysis supporting drug development. These aforementioned attributes of HT discovery bioanalysis made it an ideal candidate for using software and automation tools to eliminate manual steps, remove bottlenecks, improve efficiency and reduce turnaround time while maintaining adequate quality. In this article we will review various recent developments that facilitate automation of individual bioanalytical procedures, such as sample preparation, MS/MS method development, sample analysis and data review, as well as fully integrated software tools that manage the entire bioanalytical workflow in HT discovery bioanalysis. In addition, software tools supporting the emerging high-resolution accurate MS bioanalytical approach are also discussed.
NASA Astrophysics Data System (ADS)
Stone, S.; Parker, M. S.; Howe, B.; Lazowska, E.
2015-12-01
Rapid advances in technology are transforming nearly every field from "data-poor" to "data-rich." The ability to extract knowledge from this abundance of data is the cornerstone of 21st century discovery. At the University of Washington eScience Institute, our mission is to engage researchers across disciplines in developing and applying advanced computational methods and tools to real world problems in data-intensive discovery. Our research team consists of individuals with diverse backgrounds in domain sciences such as astronomy, oceanography and geology, with complementary expertise in advanced statistical and computational techniques such as data management, visualization, and machine learning. Two key elements are necessary to foster careers in data science: individuals with cross-disciplinary training in both method and domain sciences, and career paths emphasizing alternative metrics for advancement. We see persistent and deep-rooted challenges for the career paths of people whose skills, activities and work patterns don't fit neatly into the traditional roles and success metrics of academia. To address these challenges the eScience Institute has developed training programs and established new career opportunities for data-intensive research in academia. Our graduate students and post-docs have mentors in both a methodology and an application field. They also participate in coursework and tutorials to advance technical skill and foster community. Professional Data Scientist positions were created to support research independence while encouraging the development and adoption of domain-specific tools and techniques. The eScience Institute also supports the appointment of faculty who are innovators in developing and applying data science methodologies to advance their field of discovery. Our ultimate goal is to create a supportive environment for data science in academia and to establish global recognition for data-intensive discovery across all fields.
Development of a Suite of Analytical Tools for Energy and Water Infrastructure Knowledge Discovery
NASA Astrophysics Data System (ADS)
Morton, A.; Piburn, J.; Stewart, R.; Chandola, V.
2017-12-01
Energy and water generation and delivery systems are inherently interconnected. With demand for energy growing, the energy sector is experiencing increasing competition for water. With increasing population and changing environmental, socioeconomic, and demographic scenarios, new technology and investment decisions must be made for optimized and sustainable energy-water resource management. This also requires novel scientific insights into the complex interdependencies of energy-water infrastructures across multiple space and time scales. To address this need, we've developed a suite of analytical tools to support an integrated data driven modeling, analysis, and visualization capability for understanding, designing, and developing efficient local and regional practices related to the energy-water nexus. This work reviews the analytical capabilities available along with a series of case studies designed to demonstrate the potential of these tools for illuminating energy-water nexus solutions and supporting strategic (federal) policy decisions.
Wilkins, J J; Chan, Pls; Chard, J; Smith, G; Smith, M K; Beer, M; Dunn, A; Flandorfer, C; Franklin, C; Gomeni, R; Harnisch, L; Kaye, R; Moodie, S; Sardu, M L; Wang, E; Watson, E; Wolstencroft, K; Cheung, Sya
2017-05-01
Pharmacometric analyses are complex and multifactorial. It is essential to check, track, and document the vast amounts of data and metadata that are generated during these analyses (and the relationships between them) in order to comply with regulations, support quality control, auditing, and reporting. It is, however, challenging, tedious, error-prone, and time-consuming, and diverts pharmacometricians from the more useful business of doing science. Automating this process would save time, reduce transcriptional errors, support the retention and transfer of knowledge, encourage good practice, and help ensure that pharmacometric analyses appropriately impact decisions. The ability to document, communicate, and reconstruct a complete pharmacometric analysis using an open standard would have considerable benefits. In this article, the Innovative Medicines Initiative (IMI) Drug Disease Model Resources (DDMoRe) consortium proposes a set of standards to facilitate the capture, storage, and reporting of knowledge (including assumptions and decisions) in the context of model-informed drug discovery and development (MID3), as well as to support reproducibility: "Thoughtflow." A prototype software implementation is provided. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Computational approaches to predict bacteriophage–host relationships
Edwards, Robert A.; McNair, Katelyn; Faust, Karoline; Raes, Jeroen; Dutilh, Bas E.
2015-01-01
Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications. PMID:26657537
The Effect of Rules and Discovery in the Retention and Retrieval of Braille Inkprint Letter Pairs.
ERIC Educational Resources Information Center
Nagengast, Daniel L.; And Others
The effects of rule knowledge were investigated using Braille inkprint pairs. Both recognition and recall were studied in three groups of subjects: rule knowledge, rule discovery, and no rule. Two hypotheses were tested: (1) that the group exposed to the rule would score better than would a discovery group and a control group; and (2) that all…
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.
Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo
Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.
Pharmacokinetic de-risking tools for selection of monoclonal antibody lead candidates
Dostalek, Miroslav; Prueksaritanont, Thomayant; Kelley, Robert F.
2017-01-01
ABSTRACT Pharmacokinetic studies play an important role in all stages of drug discovery and development. Recent advancements in the tools for discovery and optimization of therapeutic proteins have created an abundance of candidates that may fulfill target product profile criteria. Implementing a set of in silico, small scale in vitro and in vivo tools can help to identify a clinical lead molecule with promising properties at the early stages of drug discovery, thus reducing the labor and cost in advancing multiple candidates toward clinical development. In this review, we describe tools that should be considered during drug discovery, and discuss approaches that could be included in the pharmacokinetic screening part of the lead candidate generation process to de-risk unexpected pharmacokinetic behaviors of Fc-based therapeutic proteins, with an emphasis on monoclonal antibodies. PMID:28463063
Of possible cheminformatics futures.
Oprea, Tudor I; Taboureau, Olivier; Bologa, Cristian G
2012-01-01
For over a decade, cheminformatics has contributed to a wide array of scientific tasks from analytical chemistry and biochemistry to pharmacology and drug discovery; and although its contributions to decision making are recognized, the challenge is how it would contribute to faster development of novel, better products. Here we address the future of cheminformatics with primary focus on innovation. Cheminformatics developers often need to choose between "mainstream" (i.e., accepted, expected) and novel, leading-edge tools, with an increasing trend for open science. Possible futures for cheminformatics include the worst case scenario (lack of funding, no creative usage), as well as the best case scenario (complete integration, from systems biology to virtual physiology). As "-omics" technologies advance, and computer hardware improves, compounds will no longer be profiled at the molecular level, but also in terms of genetic and clinical effects. Among potentially novel tools, we anticipate machine learning models based on free text processing, an increased performance in environmental cheminformatics, significant decision-making support, as well as the emergence of robot scientists conducting automated drug discovery research. Furthermore, cheminformatics is anticipated to expand the frontiers of knowledge and evolve in an open-ended, extensible manner, allowing us to explore multiple research scenarios in order to avoid epistemological "local information minimum trap".
discovery toolset for Emulytics v. 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fritz, David; Crussell, Jonathan
The discovery toolset for Emulytics enables the construction of high-fidelity emulation models of systems. The toolset consists of a set of tools and techniques to automatically go from network discovery of operational systems to emulating those complex systems. Our toolset combines data from host discovery and network mapping tools into an intermediate representation that can then be further refined. Once the intermediate representation reaches the desired state, our toolset supports emitting the Emulytics models with varying levels of specificity based on experiment needs.
Seltmann, Katja C.; Pénzes, Zsolt; Yoder, Matthew J.; Bertone, Matthew A.; Deans, Andrew R.
2013-01-01
Hymenoptera, the insect order that includes sawflies, bees, wasps, and ants, exhibits an incredible diversity of phenotypes, with over 145,000 species described in a corpus of textual knowledge since Carolus Linnaeus. In the absence of specialized training, often spanning decades, however, these articles can be challenging to decipher. Much of the vocabulary is domain-specific (e.g., Hymenoptera biology), historically without a comprehensive glossary, and contains much homonymous and synonymous terminology. The Hymenoptera Anatomy Ontology was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy, as well as provide support for domain experts so they may actively benefit from the anatomy ontology development. As part of HAO development, an active learning, dictionary-based, natural language recognition tool was implemented to facilitate Hymenoptera anatomy term discovery in literature. We present this tool, referred to as the ‘Proofer’, as part of an iterative approach to growing phenotype-relevant ontologies, regardless of domain. The process of ontology development results in a critical mass of terms that is applied as a filter to the source collection of articles in order to reveal term occurrence and biases in natural language species descriptions. Our results indicate that taxonomists use domain-specific terminology that follows taxonomic specialization, particularly at superfamily and family level groupings and that the developed Proofer tool is effective for term discovery, facilitating ontology construction. PMID:23441153
Seltmann, Katja C; Pénzes, Zsolt; Yoder, Matthew J; Bertone, Matthew A; Deans, Andrew R
2013-01-01
Hymenoptera, the insect order that includes sawflies, bees, wasps, and ants, exhibits an incredible diversity of phenotypes, with over 145,000 species described in a corpus of textual knowledge since Carolus Linnaeus. In the absence of specialized training, often spanning decades, however, these articles can be challenging to decipher. Much of the vocabulary is domain-specific (e.g., Hymenoptera biology), historically without a comprehensive glossary, and contains much homonymous and synonymous terminology. The Hymenoptera Anatomy Ontology was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy, as well as provide support for domain experts so they may actively benefit from the anatomy ontology development. As part of HAO development, an active learning, dictionary-based, natural language recognition tool was implemented to facilitate Hymenoptera anatomy term discovery in literature. We present this tool, referred to as the 'Proofer', as part of an iterative approach to growing phenotype-relevant ontologies, regardless of domain. The process of ontology development results in a critical mass of terms that is applied as a filter to the source collection of articles in order to reveal term occurrence and biases in natural language species descriptions. Our results indicate that taxonomists use domain-specific terminology that follows taxonomic specialization, particularly at superfamily and family level groupings and that the developed Proofer tool is effective for term discovery, facilitating ontology construction.
Open reading frames associated with cancer in the dark matter of the human genome.
Delgado, Ana Paula; Brandao, Pamela; Chapado, Maria Julia; Hamid, Sheilin; Narayanan, Ramaswamy
2014-01-01
The uncharacterized proteins (open reading frames, ORFs) in the human genome offer an opportunity to discover novel targets for cancer. A systematic analysis of the dark matter of the human proteome for druggability and biomarker discovery is crucial to mining the genome. Numerous data mining tools are available to mine these ORFs to develop a comprehensive knowledge base for future target discovery and validation. Using the Genetic Association Database, the ORFs of the human dark matter proteome were screened for evidence of association with neoplasms. The Phenome-Genome Integrator tool was used to establish phenotypic association with disease traits including cancer. Batch analysis of the tools for protein expression analysis, gene ontology and motifs and domains was used to characterize the ORFs. Sixty-two ORFs were identified for neoplasm association. The expression Quantitative Trait Loci (eQTL) analysis identified thirteen ORFs related to cancer traits. Protein expression, motifs and domain analysis and genome-wide association studies verified the relevance of these OncoORFs in diverse tumors. The OncoORFs are also associated with a wide variety of human diseases and disorders. Our results link the OncoORFs to diverse diseases and disorders. This suggests a complex landscape of the uncharacterized proteome in human diseases. These results open the dark matter of the proteome to novel cancer target research. Copyright© 2014, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.
Integrated Approaches to Drug Discovery for Oxidative Stress-Related Retinal Diseases.
Nishimura, Yuhei; Hara, Hideaki
2016-01-01
Excessive oxidative stress induces dysregulation of functional networks in the retina, resulting in retinal diseases such as glaucoma, age-related macular degeneration, and diabetic retinopathy. Although various therapies have been developed to reduce oxidative stress in retinal diseases, most have failed to show efficacy in clinical trials. This may be due to oversimplification of target selection for such a complex network as oxidative stress. Recent advances in high-throughput technologies have facilitated the collection of multilevel omics data, which has driven growth in public databases and in the development of bioinformatics tools. Integration of the knowledge gained from omics databases can be used to generate disease-related biological networks and to identify potential therapeutic targets within the networks. Here, we provide an overview of integrative approaches in the drug discovery process and provide simple examples of how the approaches can be exploited to identify oxidative stress-related targets for retinal diseases.
Integrated Approaches to Drug Discovery for Oxidative Stress-Related Retinal Diseases
Hara, Hideaki
2016-01-01
Excessive oxidative stress induces dysregulation of functional networks in the retina, resulting in retinal diseases such as glaucoma, age-related macular degeneration, and diabetic retinopathy. Although various therapies have been developed to reduce oxidative stress in retinal diseases, most have failed to show efficacy in clinical trials. This may be due to oversimplification of target selection for such a complex network as oxidative stress. Recent advances in high-throughput technologies have facilitated the collection of multilevel omics data, which has driven growth in public databases and in the development of bioinformatics tools. Integration of the knowledge gained from omics databases can be used to generate disease-related biological networks and to identify potential therapeutic targets within the networks. Here, we provide an overview of integrative approaches in the drug discovery process and provide simple examples of how the approaches can be exploited to identify oxidative stress-related targets for retinal diseases. PMID:28053689
Getting the Word Out: New Approaches for Disseminating Public Health Science
Eyler, Amy A.; Harris, Jenine K.; Moore, Justin B.; Tabak, Rachel G.
2018-01-01
The gap between discovery of public health knowledge and application in practice settings and policy development is due in part to ineffective dissemination. This article describes (1) lessons related to dissemination from related disciplines (eg, communication, agriculture, social marketing, political science), (2) current practices among researchers, (3) key audience characteristics, (4) available tools for dissemination, and (5) measures of impact. Dissemination efforts need to take into account the message, source, audience, and channel. Practitioners and policy makers can be more effectively reached via news media, social media, issue or policy briefs, one-on-one meetings, and workshops and seminars. Numerous “upstream” and “midstream” indicators of impact include changes in public perception or awareness, greater use of evidence-based interventions, and changes in policy. By employing ideas outlined in this article, scientific discoveries are more likely to be applied in public health agencies and policy-making bodies. PMID:28885319
Blood-based diagnostics of traumatic brain injuries
Mondello, Stefania; Muller, Uwe; Jeromin, Andreas; Streeter, Jackson; Hayes, Ronald L; Wang, Kevin KW
2011-01-01
Traumatic brain injury is a major health and socioeconomic problem that affects all societies. However, traditional approaches to the classification of clinical severity are the subject of debate and are being supplemented with structural and functional neuroimaging, as the need for biomarkers that reflect elements of the pathogenetic process is widely recognized. Basic science research and developments in the field of proteomics have greatly advanced our knowledge of the mechanisms involved in damage and have led to the discovery and rapid detection of new biomarkers that were not available previously. However, translating this research for patients' benefits remains a challenge. In this article, we summarize new developments, current knowledge and controversies, focusing on the potential role of these biomarkers as diagnostic, prognostic and monitoring tools of brain-injured patients. PMID:21171922
Knowledge Discovery and Data Mining: An Overview
NASA Technical Reports Server (NTRS)
Fayyad, U.
1995-01-01
The process of knowledge discovery and data mining is the process of information extraction from very large databases. Its importance is described along with several techniques and considerations for selecting the most appropriate technique for extracting information from a particular data set.
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 12 Banks and Banking 4 2014-01-01 2014-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 12 Banks and Banking 4 2012-01-01 2012-01-01 false Discovery depositions. 263.53 Section 263.53... Discovery depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts...
High-throughput strategies for the discovery and engineering of enzymes for biocatalysis.
Jacques, Philippe; Béchet, Max; Bigan, Muriel; Caly, Delphine; Chataigné, Gabrielle; Coutte, François; Flahaut, Christophe; Heuson, Egon; Leclère, Valérie; Lecouturier, Didier; Phalip, Vincent; Ravallec, Rozenn; Dhulster, Pascal; Froidevaux, Rénato
2017-02-01
Innovations in novel enzyme discoveries impact upon a wide range of industries for which biocatalysis and biotransformations represent a great challenge, i.e., food industry, polymers and chemical industry. Key tools and technologies, such as bioinformatics tools to guide mutant library design, molecular biology tools to create mutants library, microfluidics/microplates, parallel miniscale bioreactors and mass spectrometry technologies to create high-throughput screening methods and experimental design tools for screening and optimization, allow to evolve the discovery, development and implementation of enzymes and whole cells in (bio)processes. These technological innovations are also accompanied by the development and implementation of clean and sustainable integrated processes to meet the growing needs of chemical, pharmaceutical, environmental and biorefinery industries. This review gives an overview of the benefits of high-throughput screening approach from the discovery and engineering of biocatalysts to cell culture for optimizing their production in integrated processes and their extraction/purification.
ERIC Educational Resources Information Center
Benoit, Gerald
2002-01-01
Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…
Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo
2018-03-15
RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
Chronic Pancreatitis in the 21st Century - Research Challenges and Opportunities
Uc, Aliye; Andersen, Dana K.; Bellin, Melena D.; Bruce, Jason I.; Drewes, Asbjørn M.; Engelhardt, John F.; Forsmark, Christopher E.; Lerch, Markus M.; Lowe, Mark E.; Neuschwander-Tetri, Brent A.; O’Keefe, Stephen J.; Palermo, Tonya M.; Pasricha, Pankaj; Saluja, Ashok K.; Singh, Vikesh K.; Szigethy, Eva M.; Whitcomb, David C.; Yadav, Dhiraj; Conwell, Darwin L.
2016-01-01
A workshop was sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) to focus on research gaps and opportunities in chronic pancreatitis (CP) and its sequelae. This conference marked the 20th year anniversary of the discovery of the cationic trypsinogen (PRSS1) gene mutation for hereditary pancreatitis. The event was held on July 27, 2016, and structured into 4 sessions: (1) pathophysiology; (2) exocrine complications; (3) endocrine complications; and (4) pain. The current state of knowledge was reviewed; many knowledge gaps and research needs were identified that require further investigation. Common themes included the need to design better tools to diagnose CP and its sequelae early and reliably, identify predisposing risk factors for disease progression, develop standardized protocols to distinguish type 3c diabetes mellitus from other types of diabetes and design effective therapeutic strategies through novel cell culture technologies, animal models mimicking human disease, and pain management tools. Gene therapy and cystic fibrosis conductance regulator (CFTR) potentiators as possible treatments for CP were discussed. Importantly, the need for chronic pancreatitis endpoints and intermediate targets for future drug trials was emphasized. PMID:27748719
An interactive visualization tool for mobile objects
NASA Astrophysics Data System (ADS)
Kobayashi, Tetsuo
Recent advancements in mobile devices---such as Global Positioning System (GPS), cellular phones, car navigation system, and radio-frequency identification (RFID)---have greatly influenced the nature and volume of data about individual-based movement in space and time. Due to the prevalence of mobile devices, vast amounts of mobile objects data are being produced and stored in databases, overwhelming the capacity of traditional spatial analytical methods. There is a growing need for discovering unexpected patterns, trends, and relationships that are hidden in the massive mobile objects data. Geographic visualization (GVis) and knowledge discovery in databases (KDD) are two major research fields that are associated with knowledge discovery and construction. Their major research challenges are the integration of GVis and KDD, enhancing the ability to handle large volume mobile objects data, and high interactivity between the computer and users of GVis and KDD tools. This dissertation proposes a visualization toolkit to enable highly interactive visual data exploration for mobile objects datasets. Vector algebraic representation and online analytical processing (OLAP) are utilized for managing and querying the mobile object data to accomplish high interactivity of the visualization tool. In addition, reconstructing trajectories at user-defined levels of temporal granularity with time aggregation methods allows exploration of the individual objects at different levels of movement generality. At a given level of generality, individual paths can be combined into synthetic summary paths based on three similarity measures, namely, locational similarity, directional similarity, and geometric similarity functions. A visualization toolkit based on the space-time cube concept exploits these functionalities to create a user-interactive environment for exploring mobile objects data. Furthermore, the characteristics of visualized trajectories are exported to be utilized for data mining, which leads to the integration of GVis and KDD. Case studies using three movement datasets (personal travel data survey in Lexington, Kentucky, wild chicken movement data in Thailand, and self-tracking data in Utah) demonstrate the potential of the system to extract meaningful patterns from the otherwise difficult to comprehend collections of space-time trajectories.
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.« less
Semantic e-Science in Space Physics - A Case Study
NASA Astrophysics Data System (ADS)
Narock, T.; Yoon, V.; Merka, J.; Szabo, A.
2009-05-01
Several search and retrieval systems for space physics data are currently under development in NASA's heliophysics data environment. We present a case study of two such systems, and describe our efforts in implementing an ontology to aid in data discovery. In doing so we highlight the various aspects of knowledge representation and show how they led to our ontology design, creation, and implementation. We discuss advantages that scientific reasoning allows, as well as difficulties encountered in current tools and standards. Finally, we present a space physics research project conducted with and without e-Science and contrast the two approaches.
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
On the brink of extinction: the future of translational physician-scientists in the United States.
Furuya, Hideki; Brenner, Dean; Rosser, Charles J
2017-05-01
Over the past decade, we have seen an unparalleled growth in our knowledge of cancer biology and the translation of this biology into a new generation of therapeutic tools that are changing cancer treatment outcomes. With the continued explosion of new biologic discoveries, we find ourselves with a limited number of trained and engaged translational physician-scientists capable of bridging the chasm between basic science and clinical science. Here, we discuss the current state translational physician-scientists find themselves in and offer solutions to navigate during this difficult time.
The role of collaborative ontology development in the knowledge negotiation process
NASA Astrophysics Data System (ADS)
Rivera, Norma
Interdisciplinary research (IDR) collaboration can be defined as the process of integrating experts' knowledge, perspectives, and resources to advance scientific discovery. The flourishing of more complex research problems, together with the growth of scientific and technical knowledge has resulted in the need for researchers from diverse fields to provide different expertise and points of view to tackle these problems. These collaborations, however, introduce a new set of "culture" barriers as participating experts are trained to communicate in discipline-specific languages, theories, and research practices. We propose that building a common knowledge base for research using ontology development techniques can provide a starting point for interdisciplinary knowledge exchange, negotiation, and integration. The goal of this work is to extend ontology development techniques to support the knowledge negotiation process in IDR groups. Towards this goal, this work presents a methodology that extends previous work in collaborative ontology development and integrates learning strategies and tools to enhance interdisciplinary research practices. We evaluate the effectiveness of applying such methodology in three different scenarios that cover educational and research settings. The results of this evaluation confirm that integrating learning strategies can, in fact, be advantageous to overall collaborative practices in IDR groups.
This procedure is designed to support the collection of potentially responsive information using automated E-Discovery tools that rely on keywords, key phrases, index queries, or other technological assistance to retrieve Electronically Stored Information
Resource Discovery within the Networked "Hybrid" Library.
ERIC Educational Resources Information Center
Leigh, Sally-Anne
This paper focuses on the development, adoption, and integration of resource discovery, knowledge management, and/or knowledge sharing interfaces such as interactive portals, and the use of the library's World Wide Web presence to increase the availability and usability of information services. The introduction addresses changes in library…
A new approach to the rationale discovery of polymeric biomaterials
Kohn, Joachim; Welsh, William J.; Knight, Doyle
2007-01-01
This paper attempts to illustrate both the need for new approaches to biomaterials discovery as well as the significant promise inherent in the use of combinatorial and computational design strategies. The key observation of this Leading Opinion Paper is that the biomaterials community has been slow to embrace advanced biomaterials discovery tools such as combinatorial methods, high throughput experimentation, and computational modeling in spite of the significant promise shown by these discovery tools in materials science, medicinal chemistry and the pharmaceutical industry. It seems that the complexity of living cells and their interactions with biomaterials has been a conceptual as well as a practical barrier to the use of advanced discovery tools in biomaterials science. However, with the continued increase in computer power, the goal of predicting the biological response of cells in contact with biomaterials surfaces is within reach. Once combinatorial synthesis, high throughput experimentation, and computational modeling are integrated into the biomaterials discovery process, a significant acceleration is possible in the pace of development of improved medical implants, tissue regeneration scaffolds, and gene/drug delivery systems. PMID:17644176
2013-01-01
Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework. PMID:23763826
Holzinger, Andreas; Zupan, Mario
2013-06-13
Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.
OWLing Clinical Data Repositories With the Ontology Web Language
Pastor, Xavier; Lozano, Esther
2014-01-01
Background The health sciences are based upon information. Clinical information is usually stored and managed by physicians with precarious tools, such as spreadsheets. The biomedical domain is more complex than other domains that have adopted information and communication technologies as pervasive business tools. Moreover, medicine continuously changes its corpus of knowledge because of new discoveries and the rearrangements in the relationships among concepts. This scenario makes it especially difficult to offer good tools to answer the professional needs of researchers and constitutes a barrier that needs innovation to discover useful solutions. Objective The objective was to design and implement a framework for the development of clinical data repositories, capable of facing the continuous change in the biomedicine domain and minimizing the technical knowledge required from final users. Methods We combined knowledge management tools and methodologies with relational technology. We present an ontology-based approach that is flexible and efficient for dealing with complexity and change, integrated with a solid relational storage and a Web graphical user interface. Results Onto Clinical Research Forms (OntoCRF) is a framework for the definition, modeling, and instantiation of data repositories. It does not need any database design or programming. All required information to define a new project is explicitly stated in ontologies. Moreover, the user interface is built automatically on the fly as Web pages, whereas data are stored in a generic repository. This allows for immediate deployment and population of the database as well as instant online availability of any modification. Conclusions OntoCRF is a complete framework to build data repositories with a solid relational storage. Driven by ontologies, OntoCRF is more flexible and efficient to deal with complexity and change than traditional systems and does not require very skilled technical people facilitating the engineering of clinical software systems. PMID:25599697
OWLing Clinical Data Repositories With the Ontology Web Language.
Lozano-Rubí, Raimundo; Pastor, Xavier; Lozano, Esther
2014-08-01
The health sciences are based upon information. Clinical information is usually stored and managed by physicians with precarious tools, such as spreadsheets. The biomedical domain is more complex than other domains that have adopted information and communication technologies as pervasive business tools. Moreover, medicine continuously changes its corpus of knowledge because of new discoveries and the rearrangements in the relationships among concepts. This scenario makes it especially difficult to offer good tools to answer the professional needs of researchers and constitutes a barrier that needs innovation to discover useful solutions. The objective was to design and implement a framework for the development of clinical data repositories, capable of facing the continuous change in the biomedicine domain and minimizing the technical knowledge required from final users. We combined knowledge management tools and methodologies with relational technology. We present an ontology-based approach that is flexible and efficient for dealing with complexity and change, integrated with a solid relational storage and a Web graphical user interface. Onto Clinical Research Forms (OntoCRF) is a framework for the definition, modeling, and instantiation of data repositories. It does not need any database design or programming. All required information to define a new project is explicitly stated in ontologies. Moreover, the user interface is built automatically on the fly as Web pages, whereas data are stored in a generic repository. This allows for immediate deployment and population of the database as well as instant online availability of any modification. OntoCRF is a complete framework to build data repositories with a solid relational storage. Driven by ontologies, OntoCRF is more flexible and efficient to deal with complexity and change than traditional systems and does not require very skilled technical people facilitating the engineering of clinical software systems.
2010-01-01
Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong
2010-01-18
The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
Form-Focused Discovery Activities in English Classes
ERIC Educational Resources Information Center
Ogeyik, Muhlise Cosgun
2011-01-01
Form-focused discovery activities allow language learners to grasp various aspects of a target language by contributing implicit knowledge by using discovered explicit knowledge. Moreover, such activities can assist learners to perceive and discover the features of their language input. In foreign language teaching environments, they can be used…
Brodney, Marian D; Brosius, Arthur D; Gregory, Tracy; Heck, Steven D; Klug-McLeod, Jacquelyn L; Poss, Christopher S
2009-12-01
Advances in the field of drug discovery have brought an explosion in the quantity of data available to medicinal chemists and other project team members. New strategies and systems are needed to help these scientists to efficiently gather, organize, analyze, annotate, and share data about potential new drug molecules of interest to their project teams. Herein we describe a suite of integrated services and end-user applications that facilitate these activities throughout the medicinal chemistry design cycle. The Automated Data Presentation (ADP) and Virtual Compound Profiler (VCP) processes automate the gathering, organization, and storage of real and virtual molecules, respectively, and associated data. The Project-Focused Activity and Knowledge Tracker (PFAKT) provides a unified data analysis and collaboration environment, enhancing decision-making, improving team communication, and increasing efficiency.
User Driven Development of Software Tools for Open Data Discovery and Exploration
NASA Astrophysics Data System (ADS)
Schlobinski, Sascha; Keppel, Frank; Dihe, Pascal; Boot, Gerben; Falkenroth, Esa
2016-04-01
The use of open data in research faces challenges not restricted to inherent properties such as data quality, resolution of open data sets. Often Open data is catalogued insufficiently or fragmented. Software tools that support the effective discovery including the assessment of the data's appropriateness for research have shortcomings such as the lack of essential functionalities like support for data provenance. We believe that one of the reasons is the neglect of real end users requirements in the development process of aforementioned software tools. In the context of the FP7 Switch-On project we have pro-actively engaged the relevant user user community to collaboratively develop a means to publish, find and bind open data relevant for hydrologic research. Implementing key concepts of data discovery and exploration we have used state of the art web technologies to provide an interactive software tool that is easy to use yet powerful enough to satisfy the data discovery and access requirements of the hydrological research community.
75 FR 66766 - NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and Development
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-29
..., identifies gaps in knowledge and capabilities, and defines NIAID's goals for the continued discovery... DEPARTMENT OF HEALTH AND HUMAN SERVICES NIAID Blue Ribbon Panel Meeting on Adjuvant Discovery and... agenda for the discovery, development and clinical evaluation of adjuvants for use with preventive...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 12 Banks and Banking 3 2011-01-01 2011-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...
12 CFR 19.170 - Discovery depositions.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 1 2010-01-01 2010-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...
12 CFR 19.170 - Discovery depositions.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 12 Banks and Banking 1 2011-01-01 2011-01-01 false Discovery depositions. 19.170 Section 19.170... PROCEDURE Discovery Depositions and Subpoenas § 19.170 Discovery depositions. (a) General rule. In any... deposition of an expert, or of a person, including another party, who has direct knowledge of matters that...
12 CFR 263.53 - Discovery depositions.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 3 2010-01-01 2010-01-01 false Discovery depositions. 263.53 Section 263.53... depositions. (a) In general. In addition to the discovery permitted in subpart A of this part, limited discovery by means of depositions shall be allowed for individuals with knowledge of facts material to the...
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
Computational Tools for Allosteric Drug Discovery: Site Identification and Focus Library Design.
Huang, Wenkang; Nussinov, Ruth; Zhang, Jian
2017-01-01
Allostery is an intrinsic phenomenon of biological macromolecules involving regulation and/or signal transduction induced by a ligand binding to an allosteric site distinct from a molecule's active site. Allosteric drugs are currently receiving increased attention in drug discovery because drugs that target allosteric sites can provide important advantages over the corresponding orthosteric drugs including specific subtype selectivity within receptor families. Consequently, targeting allosteric sites, instead of orthosteric sites, can reduce drug-related side effects and toxicity. On the down side, allosteric drug discovery can be more challenging than traditional orthosteric drug discovery due to difficulties associated with determining the locations of allosteric sites and designing drugs based on these sites and the need for the allosteric effects to propagate through the structure, reach the ligand binding site and elicit a conformational change. In this study, we present computational tools ranging from the identification of potential allosteric sites to the design of "allosteric-like" modulator libraries. These tools may be particularly useful for allosteric drug discovery.
Study of Tools for Network Discovery and Network Mapping
2003-11-01
connected to the switch. iv. Accessibility of historical data and event data In general, network discovery tools keep a history of the collected...has the following software dependencies: - Java Virtual machine 76 - Perl modules - RRD Tool - TomCat - PostgreSQL STRENGTHS AND...systems - provide a simple view of the current network status - generate alarms on status change - generate history of status change VISUAL MAP
Advancements in Large-Scale Data/Metadata Management for Scientific Data.
NASA Astrophysics Data System (ADS)
Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.
2017-12-01
Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight the future enhancements of these tools which enable users to retrieve fast search results, along with parallelizing the retrieval process from online and High Performance Storage Systems. In addition, these improvements to the tools will support additional metadata formats like the Large-Eddy Simulation (LES) ARM Symbiotic and Observation (LASSO) bundle data.
Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z
2009-05-01
Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.
Orland, Barbara
2012-06-01
This paper investigates the theory of nutrition of Herman Boerhaave, the famous professor of medicine and chemistry at the university of Leyden. Boerhaave's work, which systematized and synthesized the knowledge of the time, represents a shift from a humoral to a hydraulic model of the body in medicine and culture around 1700. This epistemological reconfiguration of early modern physiological thinking is exemplified with respect to the changing meanings of milk. While over centuries the analogy between blood and milk played an essential role in understanding the hidden workings of the nutritional faculties, following the discovery of the blood circulation the blood-milk analogy was transformed into a chyle-milk analogy. Yet Boerhaave's interpretations show that the use of new knowledge tools did not simply displace the old ways of reasoning. Instead, analogies continued to serve as epistemic instruments. Old theories and new insights overlapped, and contemporary knowledge assimilated past ideas. Copyright © 2011. Published by Elsevier Ltd.
Nursing Routine Data as a Basis for Association Analysis in the Domain of Nursing Knowledge
Sellemann, Björn; Stausberg, Jürgen; Hübner, Ursula
2012-01-01
This paper describes the data mining method of association analysis within the framework of Knowledge Discovery in Databases (KDD) with the aim to identify standard patterns of nursing care. The approach is application-oriented and used on nursing routine data of the method LEP nursing 2. The increasing use of information technology in hospitals, especially of nursing information systems, requires the storage of large data sets, which hitherto have not always been analyzed adequately. Three association analyses for the days of admission, surgery and discharge, have been performed. The results of almost 1.5 million generated association rules indicate that it is valid to apply association analysis to nursing routine data. All rules are semantically trivial, since they reflect existing knowledge from the domain of nursing. This may be due either to the method LEP Nursing 2, or to the nursing activities themselves. Nonetheless, association analysis may in future become a useful analytical tool on the basis of structured nursing routine data. PMID:24199122
Medical knowledge discovery and management.
Prior, Fred
2009-05-01
Although the volume of medical information is growing rapidly, the ability to rapidly convert this data into "actionable insights" and new medical knowledge is lagging far behind. The first step in the knowledge discovery process is data management and integration, which logically can be accomplished through the application of data warehouse technologies. A key insight that arises from efforts in biosurveillance and the global scope of military medicine is that information must be integrated over both time (longitudinal health records) and space (spatial localization of health-related events). Once data are compiled and integrated it is essential to encode the semantics and relationships among data elements through the use of ontologies and semantic web technologies to convert data into knowledge. Medical images form a special class of health-related information. Traditionally knowledge has been extracted from images by human observation and encoded via controlled terminologies. This approach is rapidly being replaced by quantitative analyses that more reliably support knowledge extraction. The goals of knowledge discovery are the improvement of both the timeliness and accuracy of medical decision making and the identification of new procedures and therapies.
Newton, Mandi S; Scott-Findlay, Shannon
2007-01-01
Background In the past 15 years, knowledge translation in healthcare has emerged as a multifaceted and complex agenda. Theoretical and polemical discussions, the development of a science to study and measure the effects of translating research evidence into healthcare, and the role of key stakeholders including academe, healthcare decision-makers, the public, and government funding bodies have brought scholarly, organizational, social, and political dimensions to the agenda. Objective This paper discusses the current knowledge translation agenda in Canadian healthcare and how elements in this agenda shape the discovery and translation of health knowledge. Discussion The current knowledge translation agenda in Canadian healthcare involves the influence of values, priorities, and people; stakes which greatly shape the discovery of research knowledge and how it is or is not instituted in healthcare delivery. As this agenda continues to take shape and direction, ensuring that it is accountable for its influences is essential and should be at the forefront of concern to the Canadian public and healthcare community. This transparency will allow for scrutiny, debate, and improvements in health knowledge discovery and health services delivery. PMID:17916256
An, Gary C
2010-01-01
The greatest challenge facing the biomedical research community is the effective translation of basic mechanistic knowledge into clinically effective therapeutics. This challenge is most evident in attempts to understand and modulate "systems" processes/disorders, such as sepsis, cancer, and wound healing. Formulating an investigatory strategy for these issues requires the recognition that these are dynamic processes. Representation of the dynamic behavior of biological systems can aid in the investigation of complex pathophysiological processes by augmenting existing discovery procedures by integrating disparate information sources and knowledge. This approach is termed Translational Systems Biology. Focusing on the development of computational models capturing the behavior of mechanistic hypotheses provides a tool that bridges gaps in the understanding of a disease process by visualizing "thought experiments" to fill those gaps. Agent-based modeling is a computational method particularly well suited to the translation of mechanistic knowledge into a computational framework. Utilizing agent-based models as a means of dynamic hypothesis representation will be a vital means of describing, communicating, and integrating community-wide knowledge. The transparent representation of hypotheses in this dynamic fashion can form the basis of "knowledge ecologies," where selection between competing hypotheses will apply an evolutionary paradigm to the development of community knowledge.
Faded-example as a Tool to Acquire and Automate Mathematics Knowledge
NASA Astrophysics Data System (ADS)
Retnowati, E.
2017-04-01
Students themselves accomplish Knowledge acquisition and automation. The teacher plays a role as the facilitator by creating mathematics tasks that assist students in building knowledge efficiently and effectively. Cognitive load caused by learning material presented by teachers should be considered as a critical factor. While the intrinsic cognitive load is related to the degree of complexity of the material learning ones can handle, the extraneous cognitive load is directly caused by how the material is presented. Strategies to present a learning material in computational learning domains like mathematics are a namely worked example (fully-guided task) or problem-solving (discovery task with no guidance). According to the empirical evidence, learning based on problem-solving may cause high-extraneous cognitive load for students who have limited prior knowledge, conversely learn based on worked example may cause high-extraneous cognitive load for students who have mastered the knowledge base. An alternative is a faded example consisting of the partly-completed task. Learning from faded-example can facilitate students who already acquire some knowledge about the to-be-learned material but still need more practice to automate the knowledge further. This instructional strategy provides a smooth transition from a fully-guided into an independent problem solver. Designs of faded examples for learning trigonometry are discussed.
Cellular automata and its applications in protein bioinformatics.
Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen
2011-09-01
With the explosion of protein sequences generated in the postgenomic era, it is highly desirable to develop high-throughput tools for rapidly and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery. Many bioinformatics tools have been developed by means of machine learning methods. This review is focused on the applications of a new kind of science (cellular automata) in protein bioinformatics. A cellular automaton (CA) is an open, flexible and discrete dynamic model that holds enormous potentials in modeling complex systems, in spite of the simplicity of the model itself. Researchers, scientists and practitioners from different fields have utilized cellular automata for visualizing protein sequences, investigating their evolution processes, and predicting their various attributes. Owing to its impressive power, intuitiveness and relative simplicity, the CA approach has great potential for use as a tool for bioinformatics.
The Real Time Display Builder (RTDB)
NASA Technical Reports Server (NTRS)
Kindred, Erick D.; Bailey, Samuel A., Jr.
1989-01-01
The Real Time Display Builder (RTDB) is a prototype interactive graphics tool that builds logic-driven displays. These displays reflect current system status, implement fault detection algorithms in real time, and incorporate the operational knowledge of experienced flight controllers. RTDB utilizes an object-oriented approach that integrates the display symbols with the underlying operational logic. This approach allows the user to specify the screen layout and the driving logic as the display is being built. RTDB is being developed under UNIX in C utilizing the MASSCOMP graphics environment with appropriate functional separation to ease portability to other graphics environments. RTDB grew from the need to develop customized real-time data-driven Space Shuttle systems displays. One display, using initial functionality of the tool, was operational during the orbit phase of STS-26 Discovery. RTDB is being used to produce subsequent displays for the Real Time Data System project currently under development within the Mission Operations Directorate at NASA/JSC. The features of the tool, its current state of development, and its applications are discussed.
Ask-the-expert: Active Learning Based Knowledge Discovery Using the Expert
NASA Technical Reports Server (NTRS)
Das, Kamalika; Avrekh, Ilya; Matthews, Bryan; Sharma, Manali; Oza, Nikunj
2017-01-01
Often the manual review of large data sets, either for purposes of labeling unlabeled instances or for classifying meaningful results from uninteresting (but statistically significant) ones is extremely resource intensive, especially in terms of subject matter expert (SME) time. Use of active learning has been shown to diminish this review time significantly. However, since active learning is an iterative process of learning a classifier based on a small number of SME-provided labels at each iteration, the lack of an enabling tool can hinder the process of adoption of these technologies in real-life, in spite of their labor-saving potential. In this demo we present ASK-the-Expert, an interactive tool that allows SMEs to review instances from a data set and provide labels within a single framework. ASK-the-Expert is powered by an active learning algorithm for training a classifier in the backend. We demonstrate this system in the context of an aviation safety application, but the tool can be adopted to work as a simple review and labeling tool as well, without the use of active learning.
Ask-the-Expert: Active Learning Based Knowledge Discovery Using the Expert
NASA Technical Reports Server (NTRS)
Das, Kamalika
2017-01-01
Often the manual review of large data sets, either for purposes of labeling unlabeled instances or for classifying meaningful results from uninteresting (but statistically significant) ones is extremely resource intensive, especially in terms of subject matter expert (SME) time. Use of active learning has been shown to diminish this review time significantly. However, since active learning is an iterative process of learning a classifier based on a small number of SME-provided labels at each iteration, the lack of an enabling tool can hinder the process of adoption of these technologies in real-life, in spite of their labor-saving potential. In this demo we present ASK-the-Expert, an interactive tool that allows SMEs to review instances from a data set and provide labels within a single framework. ASK-the-Expert is powered by an active learning algorithm for training a classifier in the back end. We demonstrate this system in the context of an aviation safety application, but the tool can be adopted to work as a simple review and labeling tool as well, without the use of active learning.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-01-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-09-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
User needs analysis and usability assessment of DataMed - a biomedical data discovery index.
Dixit, Ram; Rogith, Deevakar; Narayana, Vidya; Salimi, Mandana; Gururaj, Anupama; Ohno-Machado, Lucila; Xu, Hua; Johnson, Todd R
2017-11-30
To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers' information and user interface needs. Biomedical researchers' information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. While available data and researchers' information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers' information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
McGovern, Mary Francis
Non-formal environmental education provides students the opportunity to learn in ways that would not be possible in a traditional classroom setting. Outdoor learning allows students to make connections to their environment and helps to foster an appreciation for nature. This type of education can be interdisciplinary---students not only develop skills in science, but also in mathematics, social studies, technology, and critical thinking. This case study focuses on a non-formal marine education program, the South Carolina Department of Natural Resources' (SCDNR) Discovery vessel based program. The Discovery curriculum was evaluated to determine impact on student knowledge about and attitude toward the estuary. Students from two South Carolina coastal counties who attended the boat program during fall 2014 were asked to complete a brief survey before, immediately after, and two weeks following the program. The results of this study indicate that both student knowledge about and attitude significantly improved after completion of the Discovery vessel based program. Knowledge and attitude scores demonstrated a positive correlation.
Advances in the genetic dissection of plant cell walls: tools and resources available in Miscanthus
Slavov, Gancho; Allison, Gordon; Bosch, Maurice
2013-01-01
Tropical C4 grasses from the genus Miscanthus are believed to have great potential as biomass crops. However, Miscanthus species are essentially undomesticated, and genetic, molecular and bioinformatics tools are in very early stages of development. Furthermore, similar to other crops targeted as lignocellulosic feedstocks, the efficient utilization of biomass is hampered by our limited knowledge of the structural organization of the plant cell wall and the underlying genetic components that control this organization. The Institute of Biological, Environmental and Rural Sciences (IBERS) has assembled an extensive collection of germplasm for several species of Miscanthus. In addition, an integrated, multidisciplinary research programme at IBERS aims to inform accelerated breeding for biomass productivity and composition, while also generating fundamental knowledge. Here we review recent advances with respect to the genetic characterization of the cell wall in Miscanthus. First, we present a summary of recent and on-going biochemical studies, including prospects and limitations for the development of powerful phenotyping approaches. Second, we review current knowledge about genetic variation for cell wall characteristics of Miscanthus and illustrate how phenotypic data, combined with high-density arrays of single-nucleotide polymorphisms, are being used in genome-wide association studies to generate testable hypotheses and guide biological discovery. Finally, we provide an overview of the current knowledge about the molecular biology of cell wall biosynthesis in Miscanthus and closely related grasses, discuss the key conceptual and technological bottlenecks, and outline the short-term prospects for progress in this field. PMID:23847628
Spotlight on Fluorescent Biosensors—Tools for Diagnostics and Drug Discovery
2013-01-01
Fluorescent biosensors constitute potent tools for probing biomolecules in their natural environment and for visualizing dynamic processes in complex biological samples, living cells, and organisms. They are well suited for highlighting molecular alterations associated with pathological disorders, thereby offering means of implementing sensitive and alternative technologies for diagnostic purposes. They constitute attractive tools for drug discovery programs, from high throughput screening assays to preclinical studies. PMID:24900780
e-IQ and IQ knowledge mining for generalized LDA
NASA Astrophysics Data System (ADS)
Jenkins, Jeffrey; van Bergem, Rutger; Sweet, Charles; Vietsch, Eveline; Szu, Harold
2015-05-01
How can the human brain uncover patterns, associations and features in real-time, real-world data? There must be a general strategy used to transform raw signals into useful features, but representing this generalization in the context of our information extraction tool set is lacking. In contrast to Big Data (BD), Large Data Analysis (LDA) has become a reachable multi-disciplinary goal in recent years due in part to high performance computers and algorithm development, as well as the availability of large data sets. However, the experience of Machine Learning (ML) and information communities has not been generalized into an intuitive framework that is useful to researchers across disciplines. The data exploration phase of data mining is a prime example of this unspoken, ad-hoc nature of ML - the Computer Scientist works with a Subject Matter Expert (SME) to understand the data, and then build tools (i.e. classifiers, etc.) which can benefit the SME and the rest of the researchers in that field. We ask, why is there not a tool to represent information in a meaningful way to the researcher asking the question? Meaning is subjective and contextual across disciplines, so to ensure robustness, we draw examples from several disciplines and propose a generalized LDA framework for independent data understanding of heterogeneous sources which contribute to Knowledge Discovery in Databases (KDD). Then, we explore the concept of adaptive Information resolution through a 6W unsupervised learning methodology feedback system. In this paper, we will describe the general process of man-machine interaction in terms of an asymmetric directed graph theory (digging for embedded knowledge), and model the inverse machine-man feedback (digging for tacit knowledge) as an ANN unsupervised learning methodology. Finally, we propose a collective learning framework which utilizes a 6W semantic topology to organize heterogeneous knowledge and diffuse information to entities within a society in a personalized way.
Virtual Screening with AutoDock: Theory and Practice
Cosconati, Sandro; Forli, Stefano; Perryman, Alex L.; Harris, Rodney; Goodsell, David S.; Olson, Arthur J.
2011-01-01
Importance to the field Virtual screening is a computer-based technique for identifying promising compounds to bind to a target molecule of known structure. Given the rapidly increasing number of protein and nucleic acid structures, virtual screening continues to grow as an effective method for the discovery of new inhibitors and drug molecules. Areas covered in this review We describe virtual screening methods that are available in the AutoDock suite of programs, and several of our successes in using AutoDock virtual screening in pharmaceutical lead discovery. What the reader will gain A general overview of the challenges of virtual screening is presented, along with the tools available in the AutoDock suite of programs for addressing these challenges. Take home message Virtual screening is an effective tool for the discovery of compounds for use as leads in drug discovery, and the free, open source program AutoDock is an effective tool for virtual screening. PMID:21532931
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
ERIC Educational Resources Information Center
Molina, Otilia Alejandro; Ratté, Sylvie
2017-01-01
This research introduces a method to construct a unified representation of teachers and students perspectives based on the actionable knowledge discovery (AKD) and delivery framework. The representation is constructed using two models: one obtained from student evaluations and the other obtained from teachers' reflections about their teaching…
ERIC Educational Resources Information Center
Taft, Laritza M.
2010-01-01
In its report "To Err is Human", The Institute of Medicine recommended the implementation of internal and external voluntary and mandatory automatic reporting systems to increase detection of adverse events. Knowledge Discovery in Databases (KDD) allows the detection of patterns and trends that would be hidden or less detectable if analyzed by…
Using Perilog to Explore "Decision Making at NASA"
NASA Technical Reports Server (NTRS)
McGreevy, Michael W.
2005-01-01
Perilog, a context intensive text mining system, is used as a discovery tool to explore topics and concerns in "Decision Making at NASA," chapter 6 of the Columbia Accident Investigation Board (CAIB) Report, Volume I. Two examples illustrate how Perilog can be used to discover highly significant safety-related information in the text without prior knowledge of the contents of the document. A third example illustrates how "if-then" statements found by Perilog can be used in logical analysis of decision making. In addition, in order to serve as a guide for future work, the technical details of preparing a PDF document for input to Perilog are included in an appendix.
Policy forum. Data, privacy, and the greater good.
Horvitz, Eric; Mulligan, Deirdre
2015-07-17
Large-scale aggregate analyses of anonymized data can yield valuable results and insights that address public health challenges and provide new avenues for scientific discovery. These methods can extend our knowledge and provide new tools for enhancing health and wellbeing. However, they raise questions about how to best address potential threats to privacy while reaping benefits for individuals and to society as a whole. The use of machine learning to make leaps across informational and social contexts to infer health conditions and risks from nonmedical data provides representative scenarios for reflections on directions with balancing innovation and regulation. Copyright © 2015, American Association for the Advancement of Science.
A Virtual Bioinformatics Knowledge Environment for Early Cancer Detection
NASA Technical Reports Server (NTRS)
Crichton, Daniel; Srivastava, Sudhir; Johnsey, Donald
2003-01-01
Discovery of disease biomarkers for cancer is a leading focus of early detection. The National Cancer Institute created a network of collaborating institutions focused on the discovery and validation of cancer biomarkers called the Early Detection Research Network (EDRN). Informatics plays a key role in enabling a virtual knowledge environment that provides scientists real time access to distributed data sets located at research institutions across the nation. The distributed and heterogeneous nature of the collaboration makes data sharing across institutions very difficult. EDRN has developed a comprehensive informatics effort focused on developing a national infrastructure enabling seamless access, sharing and discovery of science data resources across all EDRN sites. This paper will discuss the EDRN knowledge system architecture, its objectives and its accomplishments.
ERIC Educational Resources Information Center
Harmon, Glynn
2013-01-01
The term discovery applies herein to the successful outcome of inquiry in which a significant personal, professional or scholarly breakthrough or insight occurs, and which is individually or socially acknowledged as a key contribution to knowledge. Since discoveries culminate at fixed points in time, discoveries can serve as an outcome metric for…
Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G
2013-01-01
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
Regulatory sequence analysis tools.
van Helden, Jacques
2003-07-01
The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
Uc, Aliye; Andersen, Dana K; Bellin, Melena D; Bruce, Jason I; Drewes, Asbjørn M; Engelhardt, John F; Forsmark, Christopher E; Lerch, Markus M; Lowe, Mark E; Neuschwander-Tetri, Brent A; OʼKeefe, Stephen J; Palermo, Tonya M; Pasricha, Pankaj; Saluja, Ashok K; Singh, Vikesh K; Szigethy, Eva M; Whitcomb, David C; Yadav, Dhiraj; Conwell, Darwin L
2016-11-01
A workshop was sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases to focus on research gaps and opportunities in chronic pancreatitis (CP) and its sequelae. This conference marked the 20th year anniversary of the discovery of the cationic trypsinogen (PRSS1) gene mutation for hereditary pancreatitis. The event was held on July 27, 2016, and structured into 4 sessions: (1) pathophysiology, (2) exocrine complications, (3) endocrine complications, and (4) pain. The current state of knowledge was reviewed; many knowledge gaps and research needs were identified that require further investigation. Common themes included the need to design better tools to diagnose CP and its sequelae early and reliably, identify predisposing risk factors for disease progression, develop standardized protocols to distinguish type 3c diabetes mellitus from other types of diabetes, and design effective therapeutic strategies through novel cell culture technologies, animal models mimicking human disease, and pain management tools. Gene therapy and cystic fibrosis conductance regulator potentiators as possible treatments of CP were discussed. Importantly, the need for CP end points and intermediate targets for future drug trials was emphasized.
The universe unveiled : instruments and images through history
NASA Astrophysics Data System (ADS)
Stephenson, Bruce; Bolt, Marvin; Friedman, Anna Felicity
2000-11-01
The search for understanding creates more than answers; it also produces instruments, books, maps, and other tools made and used by those seeking knowledge. The Universe Unveiled uniquely focuses on these artifacts and devices resulting from the attempts to decipher the Universe from the late fifteenth to the early twentieth century. Beautiful, full-color photographs capture these extremely rare and sometimes unusual curios. Beginning with the discovery of ways to keep time, The Universe Unveiled depicts the shift from an Earth-centered understanding of the Universe to a Sun-centered view, the mapping of the stars, and the ever-expanding knowledge of the heavens using telescopes. It also examines the developing technologies of navigation and of the measuring and mapping of the Earth. In addition to rare European curios, the book is illustrated with non-Western and American works. With more than 250 full-color images, this unique volume will delight the inventive as well as the curious.
Linking Publications to Instruments, Field Campaigns, Sites and Working Groups: The ARM Experience
NASA Astrophysics Data System (ADS)
Lehnert, K.; Parsons, M. A.; Ramachandran, R.; Fils, D.; Narock, T.; Fox, P. A.; Troyan, D.; Cialella, A. T.; Gregory, L.; Lazar, K.; Liang, M.; Ma, L.; Tilp, A.; Wagener, R.
2017-12-01
For the past 25 years, the ARM Climate Research Facility - a US Department of Energy scientific user facility - has been collecting atmospheric data in different climatic regimes using both in situ and remote instrumentation. Configuration of the facility's components has been designed to improve the understanding and representation, in climate and earth system models, of clouds and aerosols. Placing a premium on long-term continuous data collection resulted in terabytes of data having been collected, stored, and made accessible to any interested person. All data is accessible via the ARM.gov website and the ARM Data Discovery Tool. A team of metadata professionals assign appropriate tags to help facilitate searching the databases for desired data. The knowledge organization tools and concepts are used to create connections between data, instruments, field campaigns, sites, and measurements are familiar to informatics professionals. Ontology, taxonomy, classification, and thesauri are among the customized concepts put into practice for ARM's purposes. In addition to the multitude of data available, there have been approximately 3,000 journal articles that utilize ARM data. These have been linked to specific ARM web pages. Searches of the complete ARM publication database can be done using a separate interface. This presentation describes how ARM data is linked to instruments, sites, field campaigns, and publications through the application of standard knowledge organization tools and concepts.
Visualization of Multi-mission Astronomical Data with ESASky
NASA Astrophysics Data System (ADS)
Baines, Deborah; Giordano, Fabrizio; Racero, Elena; Salgado, Jesús; López Martí, Belén; Merín, Bruno; Sarmiento, María-Henar; Gutiérrez, Raúl; Ortiz de Landaluce, Iñaki; León, Ignacio; de Teodoro, Pilar; González, Juan; Nieto, Sara; Segovia, Juan Carlos; Pollock, Andy; Rosa, Michael; Arviset, Christophe; Lennon, Daniel; O'Mullane, William; de Marchi, Guido
2017-02-01
ESASky is a science-driven discovery portal to explore the multi-wavelength sky and visualize and access multiple astronomical archive holdings. The tool is a web application that requires no prior knowledge of any of the missions involved and gives users world-wide simplified access to the highest-level science data products from multiple astronomical space-based astronomy missions plus a number of ESA source catalogs. The first public release of ESASky features interfaces for the visualization of the sky in multiple wavelengths, the visualization of query results summaries, and the visualization of observations and catalog sources for single and multiple targets. This paper describes these features within ESASky, developed to address use cases from the scientific community. The decisions regarding the visualization of large amounts of data and the technologies used were made to maximize the responsiveness of the application and to keep the tool as useful and intuitive as possible.
McClay, David R
2016-01-01
In the sea urchin morphogenesis follows extensive molecular specification. The specification controls the many morphogenetic events and these, in turn, precede patterning steps that establish the larval body plan. To understand how the embryo is built it was necessary to understand those series of molecular steps. Here an example of the historical sequence of those discoveries is presented as it unfolded over the last 50 years, the years during which major progress in understanding development of many animals and plants was documented by CTDB. In sea urchin development a rich series of experimental studies first established many of the phenomenological components of skeletal morphogenesis and patterning without knowledge of the molecular components. The many discoveries of transcription factors, signals, and structural proteins that contribute to the shape of the endoskeleton of the sea urchin larva then followed as molecular tools became available. A number of transcription factors and signals were discovered that were necessary for specification, morphogenesis, and patterning. Perturbation of the transcription factors and signals provided the means for assembling models of the gene regulatory networks used for specification and controlled the subsequent morphogenetic events. The earlier experimental information informed perturbation experiments that asked how patterning worked. As a consequence it was learned that ectoderm provides a series of patterning signals to the skeletogenic cells and as a consequence the skeletogenic cells secrete a highly patterned skeleton based on their ability to genotypically decode the localized reception of several signals. We still do not understand the complexity of the signals received by the skeletogenic cells, nor do we understand in detail how the genotypic information shapes the secreted skeletal biomineral, but the current knowledge at least outlines the sequence of events and provides a useful template for future discoveries. © 2016 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Tilton, James C.; Cook, Diane J.
2008-01-01
Under a project recently selected for funding by NASA's Science Mission Directorate under the Applied Information Systems Research (AISR) program, Tilton and Cook will design and implement the integration of the Subdue graph based knowledge discovery system, developed at the University of Texas Arlington and Washington State University, with image segmentation hierarchies produced by the RHSEG software, developed at NASA GSFC, and perform pilot demonstration studies of data analysis, mining and knowledge discovery on NASA data. Subdue represents a method for discovering substructures in structural databases. Subdue is devised for general-purpose automated discovery, concept learning, and hierarchical clustering, with or without domain knowledge. Subdue was developed by Cook and her colleague, Lawrence B. Holder. For Subdue to be effective in finding patterns in imagery data, the data must be abstracted up from the pixel domain. An appropriate abstraction of imagery data is a segmentation hierarchy: a set of several segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. The RHSEG program, a recursive approximation to a Hierarchical Segmentation approach (HSEG), can produce segmentation hierarchies quickly and effectively for a wide variety of images. RHSEG and HSEG were developed at NASA GSFC by Tilton. In this presentation we provide background on the RHSEG and Subdue technologies and present a preliminary analysis on how RHSEG and Subdue may be combined to enhance image data analysis, mining and knowledge discovery.
Predicting future discoveries from current scientific literature.
Petrič, Ingrid; Cestnik, Bojan
2014-01-01
Knowledge discovery in biomedicine is a time-consuming process starting from the basic research, through preclinical testing, towards possible clinical applications. Crossing of conceptual boundaries is often needed for groundbreaking biomedical research that generates highly inventive discoveries. We demonstrate the ability of a creative literature mining method to advance valuable new discoveries based on rare ideas from existing literature. When emerging ideas from scientific literature are put together as fragments of knowledge in a systematic way, they may lead to original, sometimes surprising, research findings. If enough scientific evidence is already published for the association of such findings, they can be considered as scientific hypotheses. In this chapter, we describe a method for the computer-aided generation of such hypotheses based on the existing scientific literature. Our literature-based discovery of NF-kappaB with its possible connections to autism was recently approved by scientific community, which confirms the ability of our literature mining methodology to accelerate future discoveries based on rare ideas from existing literature.
Teachers' Journal Club: Bridging between the Dynamics of Biological Discoveries and Biology Teachers
ERIC Educational Resources Information Center
Brill, Gilat; Falk, Hedda; Yarden, Anat
2003-01-01
Since biology is one of the most dynamic research fields within the natural sciences, the gap between the accumulated knowledge in biology and the knowledge that is taught in schools, increases rapidly with time. Our long-term objective is to develop means to bridge between the dynamics of biological discoveries and the biology teachers and…
DisEpi: Compact Visualization as a Tool for Applied Epidemiological Research.
Benis, Arriel; Hoshen, Moshe
2017-01-01
Outcomes research and evidence-based medical practice is being positively impacted by proliferation of healthcare databases. Modern epidemiologic studies require complex data comprehension. A new tool, DisEpi, facilitates visual exploration of epidemiological data supporting Public Health Knowledge Discovery. It provides domain-experts a compact visualization of information at the population level. In this study, DisEpi is applied to Attention-Deficit/Hyperactivity Disorder (ADHD) patients within Clalit Health Services, analyzing the socio-demographic and ADHD filled prescription data between 2006 and 2016 of 1,605,800 children aged 6 to 17 years. DisEpi's goals facilitate the identification of (1) Links between attributes and/or events, (2) Changes in these relationships over time, and (3) Clusters of population attributes for similar trends. DisEpi combines hierarchical clustering graphics and a heatmap where color shades reflect disease time-trends. In the ADHD context, DisEpi allowed the domain-expert to visually analyze a snapshot summary of data mining results. Accordingly, the domain-expert was able to efficiently identify that: (1) Relatively younger children and particularly youngest children in class are treated more often, (2) Medication incidence increased between 2006 and 2011 but then stabilized, and (3) Progression rates of medication incidence is different for each of the 3 main discovered clusters (aka: profiles) of treated children. DisEpi delivered results similar to those previously published which used classical statistical approaches. DisEpi requires minimal preparation and fewer iterations, generating results in a user-friendly format for the domain-expert. DisEpi will be wrapped as a package containing the end-to-end discovery process. Optionally, it may provide automated annotation using calendar events (such as policy changes or media interests), which can improve discovery efficiency, interpretation, and policy implementation.
Choosing Discovery: A Literature Review on the Selection and Evaluation of Discovery Layers
ERIC Educational Resources Information Center
Moore, Kate B.; Greene, Courtney
2012-01-01
Within the next few years, traditional online public access catalogs will be replaced by more robust and interconnected discovery layers that can serve as primary public interfaces to simultaneously search many separate collections of resources. Librarians have envisioned this type of discovery tool since the 1980s, and research shows that…
ERIC Educational Resources Information Center
Kalathaki, Maria
2015-01-01
Greek school community emphasizes on the discovery direction of teaching methodology in the school Environmental Education (EE) in order to promote Education for the Sustainable Development (ESD). In ESD school projects the used methodology is experiential teamwork for inquiry based learning. The proposed tool checks whether and how a school…
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.
Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn
2018-06-01
Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data
NASA Astrophysics Data System (ADS)
Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.
2007-11-01
Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.
Model-driven discovery of underground metabolic functions in Escherichia coli.
Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M
2015-01-20
Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.
Ethnobotany and Medicinal Plant Biotechnology: From Tradition to Modern Aspects of Drug Development.
Kayser, Oliver
2018-05-24
Secondary natural products from plants are important drug leads for the development of new drug candidates for rational clinical therapy and exhibit a variety of biological activities in experimental pharmacology and serve as structural template in medicinal chemistry. The exploration of plants and discovery of natural compounds based on ethnopharmacology in combination with high sophisticated analytics is still today an important drug discovery to characterize and validate potential leads. Due to structural complexity, low abundance in biological material, and high costs in chemical synthesis, alternative ways in production like plant cell cultures, heterologous biosynthesis, and synthetic biotechnology are applied. The basis for any biotechnological process is deep knowledge in genetic regulation of pathways and protein expression with regard to todays "omics" technologies. The high number genetic techniques allowed the implementation of combinatorial biosynthesis and wide genome sequencing. Consequently, genetics allowed functional expression of biosynthetic cascades from plants and to reconstitute low-performing pathways in more productive heterologous microorganisms. Thus, de novo biosynthesis in heterologous hosts requires fundamental understanding of pathway reconstruction and multitude of genes in a foreign organism. Here, actual concepts and strategies are discussed for pathway reconstruction and genome sequencing techniques cloning tools to bridge the gap between ethnopharmaceutical drug discovery to industrial biotechnology. Georg Thieme Verlag KG Stuttgart · New York.
Arthropods as a source of new RNA viruses.
Bichaud, L; de Lamballerie, X; Alkan, C; Izri, A; Gould, E A; Charrel, R N
2014-12-01
The discovery and development of methods for isolation, characterisation and taxonomy of viruses represents an important milestone in the study, treatment and control of virus diseases during the 20th century. Indeed, by the late-1950s, it was becoming common belief that most human and veterinary pathogenic viruses had been discovered. However, at that time, knowledge of the impact of improved commercial transportation, urbanisation and deforestation, on disease emergence, was in its infancy. From the late 1960s onwards viruses, such as hepatitis virus (A, B and C) hantavirus, HIV, Marburg virus, Ebola virus and many others began to emerge and it became apparent that the world was changing, at least in terms of virus epidemiology, largely due to the influence of anthropological activities. Subsequently, with the improvement of molecular biotechnologies, for amplification of viral RNA, genome sequencing and proteomic analysis the arsenal of available tools for virus discovery and genetic characterization opened up new and exciting possibilities for virological discovery. Many recently identified but "unclassified" viruses are now being allocated to existing genera or families based on whole genome sequencing, bioinformatic and phylogenetic analysis. New species, genera and families are also being created following the guidelines of the International Committee for the Taxonomy of Viruses. Many of these newly discovered viruses are vectored by arthropods (arboviruses) and possess an RNA genome. This brief review will focus largely on the discovery of new arthropod-borne viruses. Copyright © 2014 Elsevier Ltd. All rights reserved.
Biomimicry as a basis for drug discovery.
Kolb, V M
1998-01-01
Selected works are discussed which clearly demonstrate that mimicking various aspects of the process by which natural products evolved is becoming a powerful tool in contemporary drug discovery. Natural products are an established and rich source of drugs. The term "natural product" is often used synonymously with "secondary metabolite." Knowledge of genetics and molecular evolution helps us understand how biosynthesis of many classes of secondary metabolites evolved. One proposed hypothesis is termed "inventive evolution." It invokes duplication of genes, and mutation of the gene copies, among other genetic events. The modified duplicate genes, per se or in conjunction with other genetic events, may give rise to new enzymes, which, in turn, may generate new products, some of which may be selected for. Steps of the inventive evolution can be mimicked in several ways for purpose of drug discovery. For example, libraries of chemical compounds of any imaginable structure may be produced by combinatorial synthesis. Out of these libraries new active compounds can be selected. In another example, genetic system can be manipulated to produce modified natural products ("unnatural natural products"), from which new drugs can be selected. In some instances, similar natural products turn up in species that are not direct descendants of each other. This is presumably due to a horizontal gene transfer. The mechanism of this inter-species gene transfer can be mimicked in therapeutic gene delivery. Mimicking specifics or principles of chemical evolution including experimental and test-tube evolution also provides leads for new drug discovery.
Data mining in pharma sector: benefits.
Ranjan, Jayanthi
2009-01-01
The amount of data getting generated in any sector at present is enormous. The information flow in the pharma industry is huge. Pharma firms are progressing into increased technology-enabled products and services. Data mining, which is knowledge discovery from large sets of data, helps pharma firms to discover patterns in improving the quality of drug discovery and delivery methods. The paper aims to present how data mining is useful in the pharma industry, how its techniques can yield good results in pharma sector, and to show how data mining can really enhance in making decisions using pharmaceutical data. This conceptual paper is written based on secondary study, research and observations from magazines, reports and notes. The author has listed the types of patterns that can be discovered using data mining in pharma data. The paper shows how data mining is useful in the pharma industry and how its techniques can yield good results in pharma sector. Although much work can be produced for discovering knowledge in pharma data using data mining, the paper is limited to conceptualizing the ideas and view points at this stage; future work may include applying data mining techniques to pharma data based on primary research using the available, famous significant data mining tools. Research papers and conceptual papers related to data mining in Pharma industry are rare; this is the motivation for the paper.
Knowledge Discovery from Posts in Online Health Communities Using Unified Medical Language System.
Chen, Donghua; Zhang, Runtong; Liu, Kecheng; Hou, Lei
2018-06-19
Patient-reported posts in Online Health Communities (OHCs) contain various valuable information that can help establish knowledge-based online support for online patients. However, utilizing these reports to improve online patient services in the absence of appropriate medical and healthcare expert knowledge is difficult. Thus, we propose a comprehensive knowledge discovery method that is based on the Unified Medical Language System for the analysis of narrative posts in OHCs. First, we propose a domain-knowledge support framework for OHCs to provide a basis for post analysis. Second, we develop a Knowledge-Involved Topic Modeling (KI-TM) method to extract and expand explicit knowledge within the text. We propose four metrics, namely, explicit knowledge rate, latent knowledge rate, knowledge correlation rate, and perplexity, for the evaluation of the KI-TM method. Our experimental results indicate that our proposed method outperforms existing methods in terms of providing knowledge support. Our method enhances knowledge support for online patients and can help develop intelligent OHCs in the future.
Fattore, Matteo; Arrigo, Patrizio
2005-01-01
The possibility to study an organism in terms of system theory has been proposed in the past, but only the advancement of molecular biology techniques allow us to investigate the dynamical properties of a biological system in a more quantitative and rational way than before . These new techniques can gave only the basic level view of an organisms functionality. The comprehension of its dynamical behaviour depends on the possibility to perform a multiple level analysis. Functional genomics has stimulated the interest in the investigation the dynamical behaviour of an organism as a whole. These activities are commonly known as System Biology, and its interests ranges from molecules to organs. One of the more promising applications is the 'disease modeling'. The use of experimental models is a common procedure in pharmacological and clinical researches; today this approach is supported by 'in silico' predictive methods. This investigation can be improved by a combination of experimental and computational tools. The Machine Learning (ML) tools are able to process different heterogeneous data sources, taking into account this peculiarity, they could be fruitfully applied to support a multilevel data processing (molecular, cellular and morphological) that is the prerequisite for the formal model design; these techniques can allow us to extract the knowledge for mathematical model development. The aim of our work is the development and implementation of a system that combines ML and dynamical models simulations. The program is addressed to the virtual analysis of the pathways involved in neurodegenerative diseases. These pathologies are multifactorial diseases and the relevance of the different factors has not yet been well elucidated. This is a very complex task; in order to test the integrative approach our program has been limited to the analysis of the effects of a specific protein, the Cyclin dependent kinase 5 (CDK5) which relies on the induction of neuronal apoptosis. The system has a modular structure centred on a textual knowledge discovery approach. The text mining is the only way to enhance the capability to extract ,from multiple data sources, the information required for the dynamical simulator. The user may access the publically available modules through the following site: http://biocomp.ge.ismac.cnr.it.
Learning in the context of distribution drift
2017-05-09
published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD
A Bioinformatic Approach to Inter Functional Interactions within Protein Sequences
2009-02-23
AFOSR/AOARD Reference Number: USAFAOGA07: FA4869-07-1-4050 AFOSR/AOARD Program Manager : Hiroshi Motoda, Ph.D. Period of...Conference on Knowledge Discovery and Data Mining.) In a separate study we have applied our approaches to the problem of whole genome alignment. We have...SIGKDD Conference on Knowledge Discovery and Data Mining Attached. Interactions: Please list: (a) Participation/presentations at meetings
Xiang, Yang; Lu, Kewei; James, Stephen L.; Borlawsky, Tara B.; Huang, Kun; Payne, Philip R.O.
2011-01-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. PMID:22154838
Xiang, Yang; Lu, Kewei; James, Stephen L; Borlawsky, Tara B; Huang, Kun; Payne, Philip R O
2012-04-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. Copyright © 2011 Elsevier Inc. All rights reserved.
Learning from the Mars Rover Mission: Scientific Discovery, Learning and Memory
NASA Technical Reports Server (NTRS)
Linde, Charlotte
2005-01-01
Purpose: Knowledge management for space exploration is part of a multi-generational effort. Each mission builds on knowledge from prior missions, and learning is the first step in knowledge production. This paper uses the Mars Exploration Rover mission as a site to explore this process. Approach: Observational study and analysis of the work of the MER science and engineering team during rover operations, to investigate how learning occurs, how it is recorded, and how these representations might be made available for subsequent missions. Findings: Learning occurred in many areas: planning science strategy, using instrumen?s within the constraints of the martian environment, the Deep Space Network, and the mission requirements; using software tools effectively; and running two teams on Mars time for three months. This learning is preserved in many ways. Primarily it resides in individual s memories. It is also encoded in stories, procedures, programming sequences, published reports, and lessons learned databases. Research implications: Shows the earliest stages of knowledge creation in a scientific mission, and demonstrates that knowledge management must begin with an understanding of knowledge creation. Practical implications: Shows that studying learning and knowledge creation suggests proactive ways to capture and use knowledge across multiple missions and generations. Value: This paper provides a unique analysis of the learning process of a scientific space mission, relevant for knowledge management researchers and designers, as well as demonstrating in detail how new learning occurs in a learning organization.
Discovery Bottles: A Unique Inexpensive Tool for the K-2 Science Classroom
ERIC Educational Resources Information Center
Watson, Sandy
2008-01-01
Discover discovery bottles! These wide-mouth plastic containers of any size filled with objects of different kinds can be terrific tools for science explorations and a great way to cultivate science minds in a K-2 science classroom. In addition, the author has found them to be a useful, inexpensive, and engaging way to help students develop skills…
Knowledge Retrieval Solutions.
ERIC Educational Resources Information Center
Khan, Kamran
1998-01-01
Excalibur RetrievalWare offers true knowledge retrieval solutions. Its fundamental technologies, Adaptive Pattern Recognition Processing and Semantic Networks, have capabilities for knowledge discovery and knowledge management of full-text, structured and visual information. The software delivers a combination of accuracy, extensibility,…
Knowledge extraction from evolving spiking neural networks with rank order population coding.
Soltic, Snjezana; Kasabov, Nikola
2010-12-01
This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.
Ratnam, Joseline; Zdrazil, Barbara; Digles, Daniela; Cuadrado-Rodriguez, Emiliano; Neefs, Jean-Marc; Tipney, Hannah; Siebes, Ronald; Waagmeester, Andra; Bradley, Glyn; Chau, Chau Han; Richter, Lars; Brea, Jose; Evelo, Chris T.; Jacoby, Edgar; Senger, Stefan; Loza, Maria Isabel; Ecker, Gerhard F.; Chichester, Christine
2014-01-01
Integration of open access, curated, high-quality information from multiple disciplines in the Life and Biomedical Sciences provides a holistic understanding of the domain. Additionally, the effective linking of diverse data sources can unearth hidden relationships and guide potential research strategies. However, given the lack of consistency between descriptors and identifiers used in different resources and the absence of a simple mechanism to link them, gathering and combining relevant, comprehensive information from diverse databases remains a challenge. The Open Pharmacological Concepts Triple Store (Open PHACTS) is an Innovative Medicines Initiative project that uses semantic web technology approaches to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems. The project draws together sources of publicly-available pharmacological, physicochemical and biomolecular data, represents it in a stable infrastructure and provides well-defined information exploration and retrieval methods. Here, we highlight the utility of this platform in conjunction with workflow tools to solve pharmacological research questions that require interoperability between target, compound, and pathway data. Use cases presented herein cover 1) the comprehensive identification of chemical matter for a dopamine receptor drug discovery program 2) the identification of compounds active against all targets in the Epidermal growth factor receptor (ErbB) signaling pathway that have a relevance to disease and 3) the evaluation of established targets in the Vitamin D metabolism pathway to aid novel Vitamin D analogue design. The example workflows presented illustrate how the Open PHACTS Discovery Platform can be used to exploit existing knowledge and generate new hypotheses in the process of drug discovery. PMID:25522365
Flood AI: An Intelligent Systems for Discovery and Communication of Disaster Knowledge
NASA Astrophysics Data System (ADS)
Demir, I.; Sermet, M. Y.
2017-12-01
Communities are not immune from extreme events or natural disasters that can lead to large-scale consequences for the nation and public. Improving resilience to better prepare, plan, recover, and adapt to disasters is critical to reduce the impacts of extreme events. The National Research Council (NRC) report discusses the topic of how to increase resilience to extreme events through a vision of resilient nation in the year 2030. The report highlights the importance of data, information, gaps and knowledge challenges that needs to be addressed, and suggests every individual to access the risk and vulnerability information to make their communities more resilient. This project presents an intelligent system, Flood AI, for flooding to improve societal preparedness by providing a knowledge engine using voice recognition, artificial intelligence, and natural language processing based on a generalized ontology for disasters with a primary focus on flooding. The knowledge engine utilizes the flood ontology and concepts to connect user input to relevant knowledge discovery channels on flooding by developing a data acquisition and processing framework utilizing environmental observations, forecast models, and knowledge bases. Communication channels of the framework includes web-based systems, agent-based chat bots, smartphone applications, automated web workflows, and smart home devices, opening the knowledge discovery for flooding to many unique use cases.
Duncan, Dean F; Kum, Hye-Chung; Weigensberg, Elizabeth Caplick; Flair, Kimberly A; Stewart, C Joy
2008-11-01
Proper management and implementation of an effective child welfare agency requires the constant use of information about the experiences and outcomes of children involved in the system, emphasizing the need for comprehensive, timely, and accurate data. In the past 20 years, there have been many advances in technology that can maximize the potential of administrative data to promote better evaluation and management in the field of child welfare. Specifically, this article discusses the use of knowledge discovery and data mining (KDD), which makes it possible to create longitudinal data files from administrative data sources, extract valuable knowledge, and make the information available via a user-friendly public Web site. This article demonstrates a successful project in North Carolina where knowledge discovery and data mining technology was used to develop a comprehensive set of child welfare outcomes available through a public Web site to facilitate information sharing of child welfare data to improve policy and practice.
WebArray: an online platform for microarray data analysis
Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng
2005-01-01
Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165
The Virtual Learning Commons (VLC): Enabling Co-Innovation Across Disciplines
NASA Astrophysics Data System (ADS)
Pennington, D. D.; Gandara, A.; Del Rio, N.
2014-12-01
A key challenge for scientists addressing grand-challenge problems is identifying, understanding, and integrating potentially relevant methods, models and tools that that are rapidly evolving in the informatics community. Such tools are essential for effectively integrating data and models in complex research projects, yet it is often difficult to know what tools are available and it is not easy to understand or evaluate how they might be used in a given research context. The goal of the National Science Foundation-funded Virtual Learning Commons (VLC) is to improve awareness and understanding of emerging methodologies and technologies, facilitate individual and group evaluation of these, and trace the impact of innovations within and across teams, disciplines, and communities. The VLC is a Web-based social bookmarking site designed specifically to support knowledge exchange in research communities. It is founded on well-developed models of technology adoption, diffusion of innovation, and experiential learning. The VLC makes use of Web 2.0 (Social Web) and Web 3.0 (Semantic Web) approaches. Semantic Web approaches enable discovery of potentially relevant methods, models, and tools, while Social Web approaches enable collaborative learning about their function. The VLC is under development and the first release is expected Fall 2014.
Fragment-Based Drug Discovery Using NMR Spectroscopy
Harner, Mary J.; Frank, Andreas O.; Fesik, Stephen W.
2013-01-01
Nuclear magnetic resonance (NMR) spectroscopy has evolved into a powerful tool for fragment-based drug discovery over the last two decades. While NMR has been traditionally used to elucidate the three-dimensional structures and dynamics of biomacromolecules and their interactions, it can also be a very valuable tool for the reliable identification of small molecules that bind to proteins and for hit-to-lead optimization. Here, we describe the use of NMR spectroscopy as a method for fragment-based drug discovery and how to most effectively utilize this approach for discovering novel therapeutics based on our experience. PMID:23686385
Teach-Discover-Treat (TDT): Collaborative Computational Drug Discovery for Neglected Diseases
Jansen, Johanna M.; Cornell, Wendy; Tseng, Y. Jane; Amaro, Rommie E.
2012-01-01
Teach – Discover – Treat (TDT) is an initiative to promote the development and sharing of computational tools solicited through a competition with the aim to impact education and collaborative drug discovery for neglected diseases. Collaboration, multidisciplinary integration, and innovation are essential for successful drug discovery. This requires a workforce that is trained in state-of-the-art workflows and equipped with the ability to collaborate on platforms that are accessible and free. The TDT competition solicits high quality computational workflows for neglected disease targets, using freely available, open access tools. PMID:23085175
Bellen, Hugo J; Tong, Chao; Tsuda, Hiroshi
2010-07-01
Discoveries in fruit flies have greatly contributed to our understanding of neuroscience. The use of an unparalleled wealth of tools, many of which originated between 1910–1960, has enabled milestone discoveries in nervous system development and function. Such findings have triggered and guided many research efforts in vertebrate neuroscience. After 100 years, fruit flies continue to be the choice model system for many neuroscientists. The combinational use of powerful research tools will ensure that this model organism will continue to lead to key discoveries that will impact vertebrate neuroscience.
Bellen, Hugo J; Tong, Chao; Tsuda, Hiroshi
2014-01-01
Discoveries in fruit flies have greatly contributed to our understanding of neuroscience. The use of an unparalleled wealth of tools, many of which originated between 1910–1960, has enabled milestone discoveries in nervous system development and function. Such findings have triggered and guided many research efforts in vertebrate neuroscience. After 100 years, fruit flies continue to be the choice model system for many neuroscientists. The combinational use of powerful research tools will ensure that this model organism will continue to lead to key discoveries that will impact vertebrate neuroscience. PMID:20383202
Astronomy education and the Astrophysics Source Code Library
NASA Astrophysics Data System (ADS)
Allen, Alice; Nemiroff, Robert J.
2016-01-01
The Astrophysics Source Code Library (ASCL) is an online registry of source codes used in refereed astrophysics research. It currently lists nearly 1,200 codes and covers all aspects of computational astrophysics. How can this resource be of use to educators and to the graduate students they mentor? The ASCL serves as a discovery tool for codes that can be used for one's own research. Graduate students can also investigate existing codes to see how common astronomical problems are approached numerically in practice, and use these codes as benchmarks for their own solutions to these problems. Further, they can deepen their knowledge of software practices and techniques through examination of others' codes.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
Keseler, Ingrid M; Mackie, Amanda; Santos-Zavaleta, Alberto; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Fulcher, Carol; Gama-Castro, Socorro; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Peralta-Gil, Martin; Subhraveti, Pallavi; Velázquez-Ramírez, David A; Weaver, Daniel; Collado-Vides, Julio; Paulsen, Ian; Karp, Peter D
2017-01-04
EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NASA EOSDIS: Enabling Science by Improving User Knowledge
NASA Technical Reports Server (NTRS)
Lindsay, Francis; Brennan, Jennifer; Blumenfeld, Joshua
2016-01-01
Lessons learned and impacts of applying these newer methods are explained and include several examples from our current efforts such as the interactive, on-line webinars focusing on data discovery and access including tool usage, informal and informative data chats with data experts across our EOSDIS community, data user profile interviews with scientists actively using EOSDIS data in their research, and improved conference and meeting interactions via EOSDIS data interactively used during hyper-wall talks and Worldview application. The suite of internet-based, interactive capabilities and technologies has allowed our project to expand our user community by making the data and applications from numerous Earth science missions more engaging, approachable and meaningful.
2013-01-01
Parasitic nematodes (roundworms) of small ruminants and other livestock have major economic impacts worldwide. Despite the impact of the diseases caused by these nematodes and the discovery of new therapeutic agents (anthelmintics), there has been relatively limited progress in the development of practical molecular tools to study the epidemiology of these nematodes. Specific diagnosis underpins parasite control, and the detection and monitoring of anthelmintic resistance in livestock parasites, presently a major concern around the world. The purpose of the present article is to provide a concise account of the biology and knowledge of the epidemiology of the gastrointestinal nematodes (order Strongylida), from an Australian perspective, and to emphasize the importance of utilizing advanced molecular tools for the specific diagnosis of nematode infections for refined investigations of parasite epidemiology and drug resistance detection in combination with conventional methods. It also gives a perspective on the possibility of harnessing genetic, genomic and bioinformatic technologies to better understand parasites and control parasitic diseases. PMID:23711194
Too New for Textbooks: The Biotechnology Discoveries & Applications Guidebook
ERIC Educational Resources Information Center
Loftin, Madelene; Lamb, Neil E.
2013-01-01
The "Biotechnology Discoveries and Applications" guidebook aims to provide teachers with an overview of the recent advances in genetics and biotechnology, allowing them to share these findings with their students. The annual guidebook introduces a wealth of modern genomic discoveries and provides teachers with tools to integrate exciting…
Comparison of three web-scale discovery services for health sciences research.
Hanneke, Rosie; O'Brien, Kelly K
2016-04-01
The purpose of this study was to investigate the relative effectiveness of three web-scale discovery (WSD) tools in answering health sciences search queries. Simple keyword searches, based on topics from six health sciences disciplines, were run at multiple real-world implementations of EBSCO Discovery Service (EDS), Ex Libris's Primo, and ProQuest's Summon. Each WSD tool was evaluated in its ability to retrieve relevant results and in its coverage of MEDLINE content. All WSD tools returned between 50%-60% relevant results. Primo returned a higher number of duplicate results than the other 2 WSD products. Summon results were more relevant when search terms were automatically mapped to controlled vocabulary. EDS indexed the largest number of MEDLINE citations, followed closely by Summon. Additionally, keyword searches in all 3 WSD tools retrieved relevant material that was not found with precision (Medical Subject Headings) searches in MEDLINE. None of the 3 WSD products studied was overwhelmingly more effective in returning relevant results. While difficult to place the figure of 50%-60% relevance in context, it implies a strong likelihood that the average user would be able to find satisfactory sources on the first page of search results using a rudimentary keyword search. The discovery of additional relevant material beyond that retrieved from MEDLINE indicates WSD tools' value as a supplement to traditional resources for health sciences researchers.
Comparison of three web-scale discovery services for health sciences research*
Hanneke, Rosie; O'Brien, Kelly K.
2016-01-01
Objective The purpose of this study was to investigate the relative effectiveness of three web-scale discovery (WSD) tools in answering health sciences search queries. Methods Simple keyword searches, based on topics from six health sciences disciplines, were run at multiple real-world implementations of EBSCO Discovery Service (EDS), Ex Libris's Primo, and ProQuest's Summon. Each WSD tool was evaluated in its ability to retrieve relevant results and in its coverage of MEDLINE content. Results All WSD tools returned between 50%–60% relevant results. Primo returned a higher number of duplicate results than the other 2 WSD products. Summon results were more relevant when search terms were automatically mapped to controlled vocabulary. EDS indexed the largest number of MEDLINE citations, followed closely by Summon. Additionally, keyword searches in all 3 WSD tools retrieved relevant material that was not found with precision (Medical Subject Headings) searches in MEDLINE. Conclusions None of the 3 WSD products studied was overwhelmingly more effective in returning relevant results. While difficult to place the figure of 50%–60% relevance in context, it implies a strong likelihood that the average user would be able to find satisfactory sources on the first page of search results using a rudimentary keyword search. The discovery of additional relevant material beyond that retrieved from MEDLINE indicates WSD tools' value as a supplement to traditional resources for health sciences researchers. PMID:27076797
IDENTIFYING TOXIC LEADERSHIP BEHAVIORS AND TOOLS TO FACILITATE THEIR DISCOVERY
2016-01-31
AIR WAR COLLEGE AIR UNIVERSITY IDENTIFYING TOXIC LEADERSHIP BEHAVIORS AND TOOLS TO FACILITATE THEIR DISCOVERY by Michael Boger, Lt Col...released investigations for specific, observable traits relating to toxic behavior . 3) Discuss indicators and concerns in steps one and two with...subordinates, which will aid in validating the specific observable behaviors from the lenses of each of these positions. The application of their input
Bigger data, collaborative tools and the future of predictive drug discovery
NASA Astrophysics Data System (ADS)
Ekins, Sean; Clark, Alex M.; Swamidass, S. Joshua; Litterman, Nadia; Williams, Antony J.
2014-10-01
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Bringing your tools to CyVerse Discovery Environment using Docker
Devisetty, Upendra Kumar; Kennedy, Kathleen; Sarando, Paul; Merchant, Nirav; Lyons, Eric
2016-01-01
Docker has become a very popular container-based virtualization platform for software distribution that has revolutionized the way in which scientific software and software dependencies (software stacks) can be packaged, distributed, and deployed. Docker makes the complex and time-consuming installation procedures needed for scientific software a one-time process. Because it enables platform-independent installation, versioning of software environments, and easy redeployment and reproducibility, Docker is an ideal candidate for the deployment of identical software stacks on different compute environments such as XSEDE and Amazon AWS. CyVerse’s Discovery Environment also uses Docker for integrating its powerful, community-recommended software tools into CyVerse’s production environment for public use. This paper will help users bring their tools into CyVerse Discovery Environment (DE) which will not only allows users to integrate their tools with relative ease compared to the earlier method of tool deployment in DE but will also help users to share their apps with collaborators and release them for public use. PMID:27803802
Bringing your tools to CyVerse Discovery Environment using Docker.
Devisetty, Upendra Kumar; Kennedy, Kathleen; Sarando, Paul; Merchant, Nirav; Lyons, Eric
2016-01-01
Docker has become a very popular container-based virtualization platform for software distribution that has revolutionized the way in which scientific software and software dependencies (software stacks) can be packaged, distributed, and deployed. Docker makes the complex and time-consuming installation procedures needed for scientific software a one-time process. Because it enables platform-independent installation, versioning of software environments, and easy redeployment and reproducibility, Docker is an ideal candidate for the deployment of identical software stacks on different compute environments such as XSEDE and Amazon AWS. CyVerse's Discovery Environment also uses Docker for integrating its powerful, community-recommended software tools into CyVerse's production environment for public use. This paper will help users bring their tools into CyVerse Discovery Environment (DE) which will not only allows users to integrate their tools with relative ease compared to the earlier method of tool deployment in DE but will also help users to share their apps with collaborators and release them for public use.
SNPServer: a real-time SNP discovery tool.
Savage, David; Batley, Jacqueline; Erwin, Tim; Logan, Erica; Love, Christopher G; Lim, Geraldine A C; Mongin, Emmanuel; Barker, Gary; Spangenberg, German C; Edwards, David
2005-07-01
SNPServer is a real-time flexible tool for the discovery of SNPs (single nucleotide polymorphisms) within DNA sequence data. The program uses BLAST, to identify related sequences, and CAP3, to cluster and align these sequences. The alignments are parsed to the SNP discovery software autoSNP, a program that detects SNPs and insertion/deletion polymorphisms (indels). Alternatively, lists of related sequences or pre-assembled sequences may be entered for SNP discovery. SNPServer and autoSNP use redundancy to differentiate between candidate SNPs and sequence errors. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co-segregation of the candidate SNP with other SNPs in the alignment. SNPServer is available at http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html.
On the Growth of Scientific Knowledge: Yeast Biology as a Case Study
He, Xionglei; Zhang, Jianzhi
2009-01-01
The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration. PMID:19300476
On the growth of scientific knowledge: yeast biology as a case study.
He, Xionglei; Zhang, Jianzhi
2009-03-01
The tempo and mode of human knowledge expansion is an enduring yet poorly understood topic. Through a temporal network analysis of three decades of discoveries of protein interactions and genetic interactions in baker's yeast, we show that the growth of scientific knowledge is exponential over time and that important subjects tend to be studied earlier. However, expansions of different domains of knowledge are highly heterogeneous and episodic such that the temporal turnover of knowledge hubs is much greater than expected by chance. Familiar subjects are preferentially studied over new subjects, leading to a reduced pace of innovation. While research is increasingly done in teams, the number of discoveries per researcher is greater in smaller teams. These findings reveal collective human behaviors in scientific research and help design better strategies in future knowledge exploration.
Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred
2017-01-01
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances
Lionta, Evanthia; Spyrou, George; Vassilatis, Demetrios K.; Cournia, Zoe
2014-01-01
Structure-based drug discovery (SBDD) is becoming an essential tool in assisting fast and cost-efficient lead discovery and optimization. The application of rational, structure-based drug design is proven to be more efficient than the traditional way of drug discovery since it aims to understand the molecular basis of a disease and utilizes the knowledge of the three-dimensional structure of the biological target in the process. In this review, we focus on the principles and applications of Virtual Screening (VS) within the context of SBDD and examine different procedures ranging from the initial stages of the process that include receptor and library pre-processing, to docking, scoring and post-processing of topscoring hits. Recent improvements in structure-based virtual screening (SBVS) efficiency through ensemble docking, induced fit and consensus docking are also discussed. The review highlights advances in the field within the framework of several success studies that have led to nM inhibition directly from VS and provides recent trends in library design as well as discusses limitations of the method. Applications of SBVS in the design of substrates for engineered proteins that enable the discovery of new metabolic and signal transduction pathways and the design of inhibitors of multifunctional proteins are also reviewed. Finally, we contribute two promising VS protocols recently developed by us that aim to increase inhibitor selectivity. In the first protocol, we describe the discovery of micromolar inhibitors through SBVS designed to inhibit the mutant H1047R PI3Kα kinase. Second, we discuss a strategy for the identification of selective binders for the RXRα nuclear receptor. In this protocol, a set of target structures is constructed for ensemble docking based on binding site shape characterization and clustering, aiming to enhance the hit rate of selective inhibitors for the desired protein target through the SBVS process. PMID:25262799
Yu, Ming; Cao, Qi-chen; Su, Yu-xi; Sui, Xin; Yang, Hong-jun; Huang, Lu-qi; Wang, Wen-ping
2015-08-01
Malignant tumor is one of the main causes for death in the world at present as well as a major disease seriously harming human health and life and restricting the social and economic development. There are many kinds of reports about traditional Chinese medicine patent prescriptions, empirical prescriptions and self-made prescriptions treating cancer, and prescription rules were often analyzed based on medication frequency. Such methods were applicable for discovering dominant experience but hard to have an innovative discovery and knowledge. In this paper, based on the traditional Chinese medicine inheritance assistance system, the software integration of mutual information improvement method, complex system entropy clustering and unsupervised entropy-level clustering data mining methods was adopted to analyze the rules of traditional Chinese medicine prescriptions for cancer. Totally 114 prescriptions were selected, the frequency of herbs in prescription was determined, and 85 core combinations and 13 new prescriptions were indentified. The traditional Chinese medicine inheritance assistance system, as a valuable traditional Chinese medicine research-supporting tool, can be used to record, manage, inquire and analyze prescription data.
Risk factors for autism: translating genomic discoveries into diagnostics.
Scherer, Stephen W; Dawson, Geraldine
2011-07-01
Autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in communication and reciprocal social interaction, and the presence of restricted and repetitive behaviors. The spectrum of autistic features is variable, with severity of symptoms ranging from mild to severe, sometimes with poor clinical outcomes. Twin and family studies indicate a strong genetic basis for ASD susceptibility. Recent progress in defining rare highly penetrant mutations and copy number variations as ASD risk factors has prompted early uptake of these research findings into clinical diagnostics, with microarrays becoming a 'standard of care' test for any ASD diagnostic work-up. The ever-changing landscape of the generation of genomic data coupled with the vast heterogeneity in cause and expression of ASDs (further influenced by issues of penetrance, variable expressivity, multigenic inheritance and ascertainment) creates complexity that demands careful consideration of how to apply this knowledge. Here, we discuss the scientific, ethical, policy and communication aspects of translating the new discoveries into clinical and diagnostic tools for promoting the well-being of individuals and families with ASDs.
NASA Astrophysics Data System (ADS)
Seul, M.; Brazil, L.; Castronova, A. M.
2017-12-01
CUAHSI Data Services: Tools and Cyberinfrastructure for Water Data Discovery, Research and CollaborationEnabling research surrounding interdisciplinary topics often requires a combination of finding, managing, and analyzing large data sets and models from multiple sources. This challenge has led the National Science Foundation to make strategic investments in developing community data tools and cyberinfrastructure that focus on water data, as it is central need for many of these research topics. CUAHSI (The Consortium of Universities for the Advancement of Hydrologic Science, Inc.) is a non-profit organization funded by the National Science Foundation to aid students, researchers, and educators in using and managing data and models to support research and education in the water sciences. This presentation will focus on open-source CUAHSI-supported tools that enable enhanced data discovery online using advanced searching capabilities and computational analysis run in virtual environments pre-designed for educators and scientists so they can focus their efforts on data analysis rather than IT set-up.
Applying Knowledge Discovery in Databases in Public Health Data Set: Challenges and Concerns
Volrathongchia, Kanittha
2003-01-01
In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545
Big, Deep, and Smart Data in Scanning Probe Microscopy
Kalinin, Sergei V.; Strelcov, Evgheni; Belianinov, Alex; ...
2016-09-27
Scanning probe microscopy techniques open the door to nanoscience and nanotechnology by enabling imaging and manipulation of structure and functionality of matter on nanometer and atomic scales. We analyze the discovery process by SPM in terms of information flow from tip-surface junction to the knowledge adoption by scientific community. Furthermore, we discuss the challenges and opportunities offered by merging of SPM and advanced data mining, visual analytics, and knowledge discovery technologies.
Exploiting Early Intent Recognition for Competitive Advantage
2009-01-01
basketball [Bhan- dari et al., 1997; Jug et al., 2003], and Robocup soccer sim- ulations [Riley and Veloso, 2000; 2002; Kuhlmann et al., 2006] and non...actions (e.g. before, after, around). Jug et al. [2003] used a similar framework for offline basketball game analysis. More recently, Hess et al...and K. Ramanujam. Advanced Scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery, 1(1):121–125, 1997. [Chang
ERIC Educational Resources Information Center
Fyfe, Emily R.; DeCaro, Marci S.; Rittle-Johnson, Bethany
2013-01-01
An emerging consensus suggests that guided discovery, which combines discovery and instruction, is a more effective educational approach than either one in isolation. The goal of this study was to examine two specific forms of guided discovery, testing whether conceptual instruction should precede or follow exploratory problem solving. In both…
ERIC Educational Resources Information Center
Liu, Chen-Chung; Don, Ping-Hsing; Chung, Chen-Wei; Lin, Shao-Jun; Chen, Gwo-Dong; Liu, Baw-Jhiune
2010-01-01
While Web discovery is usually undertaken as a solitary activity, Web co-discovery may transform Web learning activities from the isolated individual search process into interactive and collaborative knowledge exploration. Recent studies have proposed Web co-search environments on a single computer, supported by multiple one-to-one technologies.…
Knowledge Management in Higher Education: A Knowledge Repository Approach
ERIC Educational Resources Information Center
Wedman, John; Wang, Feng-Kwei
2005-01-01
One might expect higher education, where the discovery and dissemination of new and useful knowledge is vital, to be among the first to implement knowledge management practices. Surprisingly, higher education has been slow to implement knowledge management practices (Townley, 2003). This article describes an ongoing research and development effort…
Guasom Analysis Of The Alhambra Survey
NASA Astrophysics Data System (ADS)
Garabato, Daniel; Manteiga, Minia; Dafonte, Carlos; Álvarez, Marco A.
2017-10-01
GUASOM is a data mining tool designed for knowledge discovery in large astronomical spectrophotometric archives developed in the framework of Gaia DPAC (Data Processing and Analysis Consortium). Our tool is based on a type of unsupervised learning Artificial Neural Networks named Self-organizing maps (SOMs). SOMs permit the grouping and visualization of big amount of data for which there is no a priori knowledge and hence they are very useful for analyzing the huge amount of information present in modern spectrophotometric surveys. SOMs are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Each cluster has a representative, called prototype which is a virtual pattern that better represents or resembles the set of input patterns belonging to such a cluster. Prototypes make easier the task of determining the physical nature and properties of the objects populating each cluster. Our algorithm has been tested on the ALHAMBRA survey spectrophotometric observations, here we present our results concerning the survey segmentation, visualization of the data structure, separation between types of objects (stars and galaxies), data homogeneity of neurons, cluster prototypes, redshift distribution and crossmatch with other databases (Simbad).
Crowdsourcing Knowledge Discovery and Innovations in Medicine
2014-01-01
Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. “Health hackathons” and “data marathons”, in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled. PMID:25239002
Crowdsourcing knowledge discovery and innovations in medicine.
Celi, Leo Anthony; Ippolito, Andrea; Montgomery, Robert A; Moses, Christopher; Stone, David J
2014-09-19
Clinicians face difficult treatment decisions in contexts that are not well addressed by available evidence as formulated based on research. The digitization of medicine provides an opportunity for clinicians to collaborate with researchers and data scientists on solutions to previously ambiguous and seemingly insolvable questions. But these groups tend to work in isolated environments, and do not communicate or interact effectively. Clinicians are typically buried in the weeds and exigencies of daily practice such that they do not recognize or act on ways to improve knowledge discovery. Researchers may not be able to identify the gaps in clinical knowledge. For data scientists, the main challenge is discerning what is relevant in a domain that is both unfamiliar and complex. Each type of domain expert can contribute skills unavailable to the other groups. "Health hackathons" and "data marathons", in which diverse participants work together, can leverage the current ready availability of digital data to discover new knowledge. Utilizing the complementary skills and expertise of these talented, but functionally divided groups, innovations are formulated at the systems level. As a result, the knowledge discovery process is simultaneously democratized and improved, real problems are solved, cross-disciplinary collaboration is supported, and innovations are enabled.
A Drupal-Based Collaborative Framework for Science Workflows
NASA Astrophysics Data System (ADS)
Pinheiro da Silva, P.; Gandara, A.
2010-12-01
Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
Virtual Observatories, Data Mining, and Astroinformatics
NASA Astrophysics Data System (ADS)
Borne, Kirk
The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential for astronomical discovery is equally large, and so the data-oriented research methods, algorithms, and techniques that are presented here will enable the greatest discovery potential from the ever-growing data and information resources in astronomy.
Building Learning Modules for Undergraduate Education Using LEAD Technology
NASA Astrophysics Data System (ADS)
Clark, R. D.; Yalda, S.
2006-12-01
Linked Environments for Atmospheric Discovery (LEAD) has as its goal to make meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. LEAD advances through the development and beta-deployment of Integrated Test Beds (ITBs), which are technology build-outs that are the fruition of collaborative IT and meteorological research. As the ITBs mature, opportunities emerge for the integration of this new technological capability into the education arena. The LEAD education and outreach initiative is aimed at bringing new capabilities into classroom from the middle school level to graduate education and beyond, and ensuring the congruency of this technology with curricular. One of the principal goals of LEAD is to democratize the availability of advanced weather technologies for research and education. The degree of democratization is tied to the growth of student knowledge and skills, and is correlated with education level (though not for every student in the same way). The average high school student may experience LEAD through an environment that retains a higher level of instructor control compared to the undergraduate and graduate student. This is necessary to accommodate not only differences in knowledge and skills, but the computer capabilities in the classroom such that the "teachable moment" is not lost.Undergraduates will have the opportunity to query observation data and model output, explore and discover relationships through concept mapping using an ontology service, select domains of interest based on current weather, and employ an experiment builder within the LEAD portal as an interface to configure, launch the WRF model, monitor the workflow, and visualize results using Unidata's Integrated Data Viewer (IDV), whether it be on a local server or across the TeraGrid. Such a robust and comprehensive suite of tools and services can create new paradigms for embedding students in an authentic, contextualized environment where the knowledge domain is an extension, yet integral supplement, to the classroom experience.This presentation describes two different approaches for the use of LEAD in undergraduate education: 1) a use-case for integrating LEAD technology into undergraduate subject material; and 2) making LEAD capability available to a select group of students participating in the National Collegiate Forecasting Contest (NCFC). The use-case (1) is designed to have students explore a particular weather phenomenon (e.g., a frontal boundary, jet streak, or lake effect snow event) through self-guided inquiry, and is intended as a supplement to classroom instruction. Students will use interactive, Web-based, LEAD-to-Learn modules created specifically to build conceptual knowledge of the phenomenon, adjoin germane terminology, explore relationships between concepts and similar phenomena using the LEAD ontology, and guide them through the experiment builder and workflow orchestration process in order to establish a high-resolution WRF run over a region that exhibits the characteristics of the phenomenon they wish to study. The results of the experiment will be stored in the student's MyLEAD workspace from which it can be retrieved, visualized and analyzed for atmospheric signatures characteristic of the phenomenon. The learning process is authentic in that students will be exposed to the same process of investigation, and will have available many of the same tools, as researchers. The modules serve to build content knowledge, guide discovery, and provide assessment while the LEAD portal opens the gateway to real-time observations, model accessibility, and a variety of tools, services, and resources.
Which are the greatest recent discoveries and the greatest future challenges in nutrition?
Katan, M B; Boekschoten, M V; Connor, W E; Mensink, R P; Seidell, J; Vessby, B; Willett, W
2009-01-01
Nutrition science aims to create new knowledge, but scientists rarely sit back to reflect on what nutrition research has achieved in recent decades. We report the outcome of a 1-day symposium at which the audience was asked to vote on the greatest discoveries in nutrition since 1976 and on the greatest challenges for the coming 30 years. Most of the 128 participants were Dutch scientists working in nutrition or related biomedical and public health fields. Candidate discoveries and challenges were nominated by five invited speakers and by members of the audience. Ballot forms were then prepared on which participants selected one discovery and one challenge. A total of 15 discoveries and 14 challenges were nominated. The audience elected Folic acid prevents birth defects as the greatest discovery in nutrition science since 1976. Controlling obesity and insulin resistance through activity and diet was elected as the greatest challenge for the coming 30 years. This selection was probably biased by the interests and knowledge of the speakers and the audience. For the present review, we therefore added 12 discoveries from the period 1976 to 2006 that we judged worthy of consideration, but that had not been nominated at the meeting. The meeting did not represent an objective selection process, but it did demonstrate that the past 30 years have yielded major new discoveries in nutrition and health.
Translating three states of knowledge--discovery, invention, and innovation
2010-01-01
Background Knowledge Translation (KT) has historically focused on the proper use of knowledge in healthcare delivery. A knowledge base has been created through empirical research and resides in scholarly literature. Some knowledge is amenable to direct application by stakeholders who are engaged during or after the research process, as shown by the Knowledge to Action (KTA) model. Other knowledge requires multiple transformations before achieving utility for end users. For example, conceptual knowledge generated through science or engineering may become embodied as a technology-based invention through development methods. The invention may then be integrated within an innovative device or service through production methods. To what extent is KT relevant to these transformations? How might the KTA model accommodate these additional development and production activities while preserving the KT concepts? Discussion Stakeholders adopt and use knowledge that has perceived utility, such as a solution to a problem. Achieving a technology-based solution involves three methods that generate knowledge in three states, analogous to the three classic states of matter. Research activity generates discoveries that are intangible and highly malleable like a gas; development activity transforms discoveries into inventions that are moderately tangible yet still malleable like a liquid; and production activity transforms inventions into innovations that are tangible and immutable like a solid. The paper demonstrates how the KTA model can accommodate all three types of activity and address all three states of knowledge. Linking the three activities in one model also illustrates the importance of engaging the relevant stakeholders prior to initiating any knowledge-related activities. Summary Science and engineering focused on technology-based devices or services change the state of knowledge through three successive activities. Achieving knowledge implementation requires methods that accommodate these three activities and knowledge states. Accomplishing beneficial societal impacts from technology-based knowledge involves the successful progression through all three activities, and the effective communication of each successive knowledge state to the relevant stakeholders. The KTA model appears suitable for structuring and linking these processes. PMID:20205873
The web server of IBM's Bioinformatics and Pattern Discovery group.
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-07-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
The web server of IBM's Bioinformatics and Pattern Discovery group
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
2003-01-01
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12824385
NASA Astrophysics Data System (ADS)
Kurtz, N.; Marks, N.; Cooper, S. K.
2014-12-01
Scientific ocean drilling through the International Ocean Discovery Program (IODP) has contributed extensively to our knowledge of Earth systems science. However, many of its methods and discoveries can seem abstract and complicated for students. Collaborations between scientists and educators/artists to create accurate yet engaging demonstrations and activities have been crucial to increasing understanding and stimulating interest in fascinating geological topics. One such collaboration, which came out of Expedition 345 to the Hess Deep Rift, resulted in an interactive lab to explore sampling rocks from the usually inacessible lower oceanic crust, offering an insight into the geological processes that form the structure of the Earth's crust. This Hess Deep Interactive Lab aims to explain several significant discoveries made by oceanic drilling utilizing images of actual thin sections and core samples recovered from IODP expeditions. . Participants can interact with a physical model to learn about the coring and drilling processes, and gain an understanding of seafloor structures. The collaboration of this lab developed as a need to explain fundamental notions of the ocean crust formed at fast-spreading ridges. A complementary interactive online lab can be accessed at www.joidesresolution.org for students to engage further with these concepts. This project explores the relationship between physical and on-line models to further understanding, including what we can learn from the pros and cons of each.
Using insects for STEM outreach: Development and evaluation of the UA Insect Discovery Program
NASA Astrophysics Data System (ADS)
Beal, Benjamin D.
Science and technology impact most aspects of modern daily life. It is therefore important to create a scientifically literate society. Since the majority of Americans do not take college-level science courses, strong K-12 science education is essential. At the K-5 level, however, many teachers lack the time, resources and background for effective science teaching. Elementary teachers and students may benefit from scientist-led outreach programs created by Cooperative Extension or other institutions. One example is the University of Arizona Insect Discovery Program, which provides short-duration programing that uses insects to support science content learning, teach critical thinking and spark interest in science. We conducted evaluations of the Insect Discovery programming to determine whether the activities offered were accomplishing program goals. Pre-post tests, post program questionnaires for teachers, and novel assessments of children's drawings were used as assessment tools. Assessments were complicated by the short duration of the program interactions with the children as well as their limited literacy. In spite of these difficulties, results of the pre-post tests indicated a significant impact on content knowledge and critical thinking skills. Based on post-program teacher questionnaires, positive impacts on interest in science learning were noted as much as a month after the children participated in the program. New programming and resources developed to widen the potential for impact are also described.
Knowledge Discovery/A Collaborative Approach, an Innovative Solution
NASA Technical Reports Server (NTRS)
Fitts, Mary A.
2009-01-01
Collaboration between Medical Informatics and Healthcare Systems (MIHCS) at NASA/Johnson Space Center (JSC) and the Texas Medical Center (TMC) Library was established to investigate technologies for facilitating knowledge discovery across multiple life sciences research disciplines in multiple repositories. After reviewing 14 potential Enterprise Search System (ESS) solutions, Collexis was determined to best meet the expressed needs. A three month pilot evaluation of Collexis produced positive reports from multiple scientists across 12 research disciplines. The joint venture and a pilot-phased approach achieved the desired results without the high cost of purchasing software, hardware or additional resources to conduct the task. Medical research is highly compartmentalized by discipline, e.g. cardiology, immunology, neurology. The medical research community at large, as well as at JSC, recognizes the need for cross-referencing relevant information to generate best evidence. Cross-discipline collaboration at JSC is specifically required to close knowledge gaps affecting space exploration. To facilitate knowledge discovery across these communities, MIHCS combined expertise with the TMC library and found Collexis to best fit the needs of our researchers including:
Big, Deep, and Smart Data in Scanning Probe Microscopy.
Kalinin, Sergei V; Strelcov, Evgheni; Belianinov, Alex; Somnath, Suhas; Vasudevan, Rama K; Lingerfelt, Eric J; Archibald, Richard K; Chen, Chaomei; Proksch, Roger; Laanait, Nouamane; Jesse, Stephen
2016-09-27
Scanning probe microscopy (SPM) techniques have opened the door to nanoscience and nanotechnology by enabling imaging and manipulation of the structure and functionality of matter at nanometer and atomic scales. Here, we analyze the scientific discovery process in SPM by following the information flow from the tip-surface junction, to knowledge adoption by the wider scientific community. We further discuss the challenges and opportunities offered by merging SPM with advanced data mining, visual analytics, and knowledge discovery technologies.
Building Scalable Knowledge Graphs for Earth Science
NASA Technical Reports Server (NTRS)
Ramachandran, Rahul; Maskey, Manil; Gatlin, Patrick; Zhang, Jia; Duan, Xiaoyi; Miller, J. J.; Bugbee, Kaylin; Christopher, Sundar; Freitag, Brian
2017-01-01
Knowledge Graphs link key entities in a specific domain with other entities via relationships. From these relationships, researchers can query knowledge graphs for probabilistic recommendations to infer new knowledge. Scientific papers are an untapped resource which knowledge graphs could leverage to accelerate research discovery. Goal: Develop an end-to-end (semi) automated methodology for constructing Knowledge Graphs for Earth Science.
Genetic discoveries and nursing implications for complex disease prevention and management.
Frazier, Lorraine; Meininger, Janet; Halsey Lea, Dale; Boerwinkle, Eric
2004-01-01
The purpose of this article is to examine the management of patients with complex diseases, in light of recent genetic discoveries, and to explore how these genetic discoveries will impact nursing practice and nursing research. The nursing science processes discussed are not comprehensive of all nursing practice but, instead, are concentrated in areas where genetics will have the greatest influence. Advances in genetic science will revolutionize our approach to patients and to health care in the prevention, diagnosis, and treatment of disease, raising many issues for nursing research and practice. As the scope of genetics expands to encompass multifactorial disease processes, a continuing reexamination of the knowledge base is required for nursing practice, with incorporation of genetic knowledge into the repertoire of every nurse, and with advanced knowledge for nurses who select specialty roles in the genetics area. This article explores the impact of this revolution on nursing science and practice as well as the opportunities for nursing science and practice to participate fully in this revolution. Because of the high proportion of the population at risk for complex diseases and because nurses are occupied every day in the prevention, assessment, treatment, and therapeutic intervention of patients with such diseases in practice and research, there is great opportunity for nurses to improve health care through the application (nursing practice) and discovery (nursing research) of genetic knowledge.
Rethinking the learning of belief network probabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Musick, R.
Belief networks are a powerful tool for knowledge discovery that provide concise, understandable probabilistic models of data. There are methods grounded in probability theory to incrementally update the relationships described by the belief network when new information is seen, to perform complex inferences over any set of variables in the data, to incorporate domain expertise and prior knowledge into the model, and to automatically learn the model from data. This paper concentrates on part of the belief network induction problem, that of learning the quantitative structure (the conditional probabilities), given the qualitative structure. In particular, the current practice of rotemore » learning the probabilities in belief networks can be significantly improved upon. We advance the idea of applying any learning algorithm to the task of conditional probability learning in belief networks, discuss potential benefits, and show results of applying neutral networks and other algorithms to a medium sized car insurance belief network. The results demonstrate from 10 to 100% improvements in model error rates over the current approaches.« less
Jin, Rui; Lin, Zhi-jian; Xue, Chun-miao; Zhang, Bing
2013-09-01
Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.
Medical data mining: knowledge discovery in a clinical data warehouse.
Prather, J. C.; Lobach, D. F.; Goodwin, L. K.; Hales, J. W.; Hage, M. L.; Hammond, W. E.
1997-01-01
Clinical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within this data could provide new medical knowledge. Unfortunately, few methodologies have been developed and applied to discover this hidden knowledge. In this study, the techniques of data mining (also known as Knowledge Discovery in Databases) were used to search for relationships in a large clinical database. Specifically, data accumulated on 3,902 obstetrical patients were evaluated for factors potentially contributing to preterm birth using exploratory factor analysis. Three factors were identified by the investigators for further exploration. This paper describes the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. PMID:9357597
NASA Astrophysics Data System (ADS)
Rose, K.; Rowan, C.; Rager, D.; Dehlin, M.; Baker, D. V.; McIntyre, D.
2015-12-01
Multi-organizational research teams working jointly on projects often encounter problems with discovery, access to relevant existing resources, and data sharing due to large file sizes, inappropriate file formats, or other inefficient options that make collaboration difficult. The Energy Data eXchange (EDX) from Department of Energy's (DOE) National Energy Technology Laboratory (NETL) is an evolving online research environment designed to overcome these challenges in support of DOE's fossil energy goals while offering improved access to data driven products of fossil energy R&D such as datasets, tools, and web applications. In 2011, development of NETL's Energy Data eXchange (EDX) was initiated and offers i) a means for better preserving of NETL's research and development products for future access and re-use, ii) efficient, discoverable access to authoritative, relevant, external resources, and iii) an improved approach and tools to support secure, private collaboration and coordination between multi-organizational teams to meet DOE mission and goals. EDX presently supports fossil energy and SubTER Crosscut research activities, with an ever-growing user base. EDX is built on a heavily customized instance of the open source platform, Comprehensive Knowledge Archive Network (CKAN). EDX connects users to externally relevant data and tools through connecting to external data repositories built on different platforms and other CKAN platforms (e.g. Data.gov). EDX does not download and repost data or tools that already have an online presence. This leads to redundancy and even error. If a relevant resource already has an online instance, is hosted by another online entity, EDX will point users to that external host either using web services, inventorying URLs and other methods. EDX offers users the ability to leverage private-secure capabilities custom built into the system. The team is presently working on version 3 of EDX which will incorporate big data analytical capabilities amongst other advanced features.
Mouse Models for Drug Discovery. Can New Tools and Technology Improve Translational Power?
Zuberi, Aamir; Lutz, Cathleen
2016-01-01
Abstract The use of mouse models in biomedical research and preclinical drug evaluation is on the rise. The advent of new molecular genome-altering technologies such as CRISPR/Cas9 allows for genetic mutations to be introduced into the germ line of a mouse faster and less expensively than previous methods. In addition, the rapid progress in the development and use of somatic transgenesis using viral vectors, as well as manipulations of gene expression with siRNAs and antisense oligonucleotides, allow for even greater exploration into genomics and systems biology. These technological advances come at a time when cost reductions in genome sequencing have led to the identification of pathogenic mutations in patient populations, providing unprecedented opportunities in the use of mice to model human disease. The ease of genetic engineering in mice also offers a potential paradigm shift in resource sharing and the speed by which models are made available in the public domain. Predictively, the knowledge alone that a model can be quickly remade will provide relief to resources encumbered by licensing and Material Transfer Agreements. For decades, mouse strains have provided an exquisite experimental tool to study the pathophysiology of the disease and assess therapeutic options in a genetically defined system. However, a major limitation of the mouse has been the limited genetic diversity associated with common laboratory mice. This has been overcome with the recent development of the Collaborative Cross and Diversity Outbred mice. These strains provide new tools capable of replicating genetic diversity to that approaching the diversity found in human populations. The Collaborative Cross and Diversity Outbred strains thus provide a means to observe and characterize toxicity or efficacy of new therapeutic drugs for a given population. The combination of traditional and contemporary mouse genome editing tools, along with the addition of genetic diversity in new modeling systems, are synergistic and serve to make the mouse a better model for biomedical research, enhancing the potential for preclinical drug discovery and personalized medicine. PMID:28053071
Discovery informatics in biological and biomedical sciences: research challenges and opportunities.
Honavar, Vasant
2015-01-01
New discoveries in biological, biomedical and health sciences are increasingly being driven by our ability to acquire, share, integrate and analyze, and construct and simulate predictive models of biological systems. While much attention has focused on automating routine aspects of management and analysis of "big data", realizing the full potential of "big data" to accelerate discovery calls for automating many other aspects of the scientific process that have so far largely resisted automation: identifying gaps in the current state of knowledge; generating and prioritizing questions; designing studies; designing, prioritizing, planning, and executing experiments; interpreting results; forming hypotheses; drawing conclusions; replicating studies; validating claims; documenting studies; communicating results; reviewing results; and integrating results into the larger body of knowledge in a discipline. Against this background, the PSB workshop on Discovery Informatics in Biological and Biomedical Sciences explores the opportunities and challenges of automating discovery or assisting humans in discovery through advances (i) Understanding, formalization, and information processing accounts of, the entire scientific process; (ii) Design, development, and evaluation of the computational artifacts (representations, processes) that embody such understanding; and (iii) Application of the resulting artifacts and systems to advance science (by augmenting individual or collective human efforts, or by fully automating science).
Revelations from the Literature: How Web-Scale Discovery Has Already Changed Us
ERIC Educational Resources Information Center
Richardson, Hillary A. H.
2013-01-01
For nearly a decade now, librarians have discussed and deliberated ways, for the sake of convenience, to integrate internet-like searching into their own catalogs to mimic what academic library patrons have been using outside the library. Discovery tools and web-scale discovery services (WSDS) attempt to provide users with a similar one-stop shop…
Computer-Aided Drug Discovery: Molecular Docking of Diminazene Ligands to DNA Minor Groove
ERIC Educational Resources Information Center
Kholod, Yana; Hoag, Erin; Muratore, Katlynn; Kosenkov, Dmytro
2018-01-01
The reported project-based laboratory unit introduces upper-division undergraduate students to the basics of computer-aided drug discovery as a part of a computational chemistry laboratory course. The students learn to perform model binding of organic molecules (ligands) to the DNA minor groove with computer-aided drug discovery (CADD) tools. The…
ERIC Educational Resources Information Center
Calvert, Kristin
2015-01-01
Despite the prevalence of academic libraries adopting web-scale discovery tools, few studies have quantified their effect on the use of library collections. This study measures the impact that EBSCO Discovery Service has had on use of library resources through circulation statistics, use of electronic resources, and interlibrary loan requests.…
Promise Fulfilled? An EBSCO Discovery Service Usability Study
ERIC Educational Resources Information Center
Williams, Sarah C.; Foster, Anita K.
2011-01-01
Discovery tools are the next phase of library search systems. Illinois State University's Milner Library implemented EBSCO Discovery Service in August 2010. The authors conducted usability studies on the system in the fall of 2010. The aims of the study were twofold: first, to determine how Milner users set about using the system in order to…
ERIC Educational Resources Information Center
Asher, Andrew D.; Duke, Lynda M.; Wilson, Suzanne
2013-01-01
In 2011, researchers at Bucknell University and Illinois Wesleyan University compared the search efficacy of Serial Solutions Summon, EBSCO Discovery Service, Google Scholar, and conventional library databases. Using a mixed-methods approach, qualitative and quantitative data were gathered on students' usage of these tools. Regardless of the…
Teaching Slope of a Line Using the Graphing Calculator as a Tool for Discovery Learning
ERIC Educational Resources Information Center
Nichols, Fiona Costello
2012-01-01
Discovery learning is one of the instructional strategies sometimes used to teach Algebra I. However, little research is available that includes investigation of the effects of incorporating the graphing calculator technology with discovery learning. This study was initiated to investigate two instructional approaches for teaching slope of a line…
Homology modeling a fast tool for drug discovery: current perspectives.
Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives
Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616
Schmalhofer, F J; Tschaitschian, B
1998-11-01
In this paper, we perform a cognitive analysis of knowledge discovery processes. As a result of this analysis, the construction-integration theory is proposed as a general framework for developing cooperative knowledge evolution systems. We thus suggest that for the acquisition of new domain knowledge in medicine, one should first construct pluralistic views on a given topic which may contain inconsistencies as well as redundancies. Only thereafter does this knowledge become consolidated into a situation-specific circumscription and the early inconsistencies become eliminated. As a proof for the viability of such knowledge acquisition processes in medicine, we present the IDEAS system, which can be used for the intelligent documentation of adverse events in clinical studies. This system provides a better documentation of the side-effects of medical drugs. Thereby, knowledge evolution occurs by achieving consistent explanations in increasingly larger contexts (i.e., more cases and more pharmaceutical substrates). Finally, it is shown how prototypes, model-based approaches and cooperative knowledge evolution systems can be distinguished as different classes of knowledge-based systems.
Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
2016-03-01
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
2014-10-20
three possiblities: AKR , B6, and BALB_B) and MUP Protein (containing two possibilities: Intact and Denatured), then you can view a plot of the Strain...the tags for the last two labels. Again, if the attribute Strain has three tags: AKR , B6, 74 Distribution A . Approved for public release...AFRL-RH-WP-TR-2014-0131 A COMPREHENSIVE TOOL AND ANALYTICAL PATHWAY FOR DIFFERENTIAL MOLECULAR PROFILING AND BIOMARKER DISCOVERY
Building Better Decision-Support by Using Knowledge Discovery.
ERIC Educational Resources Information Center
Jurisica, Igor
2000-01-01
Discusses knowledge-based decision-support systems that use artificial intelligence approaches. Addresses the issue of how to create an effective case-based reasoning system for complex and evolving domains, focusing on automated methods for system optimization and domain knowledge evolution that can supplement knowledge acquired from domain…
NASA Astrophysics Data System (ADS)
Demir, I.; Krajewski, W. F.
2013-12-01
As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.
BEAM web server: a tool for structural RNA motif discovery.
Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2018-03-15
RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
The Discovery Dome: A Tool for Increasing Student Engagement
NASA Astrophysics Data System (ADS)
Brevik, Corinne
2015-04-01
The Discovery Dome is a portable full-dome theater that plays professionally-created science films. Developed by the Houston Museum of Natural Science and Rice University, this inflatable planetarium offers a state-of-the-art visual learning experience that can address many different fields of science for any grade level. It surrounds students with roaring dinosaurs, fascinating planets, and explosive storms - all immersive, engaging, and realistic. Dickinson State University has chosen to utilize its Discovery Dome to address Earth Science education at two levels. University courses across the science disciplines can use the Discovery Dome as part of their curriculum. The digital shows immerse the students in various topics ranging from astronomy to geology to weather and climate. The dome has proven to be a valuable tool for introducing new material to students as well as for reinforcing concepts previously covered in lectures or laboratory settings. The Discovery Dome also serves as an amazing science public-outreach tool. University students are trained to run the dome, and they travel with it to schools and libraries around the region. During the 2013-14 school year, our Discovery Dome visited over 30 locations. Many of the schools visited are in rural settings which offer students few opportunities to experience state-of-the-art science technology. The school kids are extremely excited when the Discovery Dome visits their community, and they will talk about the experience for many weeks. Traveling with the dome is also very valuable for the university students who get involved in the program. They become very familiar with the science content, and they gain experience working with teachers as well as the general public. They get to share their love of science, and they get to help inspire a new generation of scientists.
Bigger Data, Collaborative Tools and the Future of Predictive Drug Discovery
Clark, Alex M.; Swamidass, S. Joshua; Litterman, Nadia; Williams, Antony J.
2014-01-01
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service (SaaS) commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas. PMID:24943138
Helping science to succeed: improving processes in R&D.
Sewing, Andreas; Winchester, Toby; Carnell, Pauline; Hampton, David; Keighley, Wilma
2008-03-01
Bringing drugs to the market remains a costly and, until now, often unpredictable challenge. Although understanding the underlying science is key to further progress, our imperfect knowledge of disease and complex biological systems leaves excellence in execution as the most tangible lever to sustain our serendipitous approach to drug discovery. The problems encountered in pharmaceutical R&D are not unique, but to learn from other industries it is important to recognise similarity, rather than differences, and to advance industrialisation of R&D beyond technology and automation. Tools like Lean and Six Sigma, already applied to increase business excellence across diverse organisations, can equally be introduced to pharmaceutical R&D and offer the potential to transform operations without large-scale investment.
Computer Animations as Astronomy Educational Tool: Immanuel Kant and the Island Universes Hypothesis
NASA Astrophysics Data System (ADS)
Mijic, M.; Park, D.; Zumaeta, J.; Simonian, V.; Levitin, S.; Sullivan, A.; Kang, E. Y. E.; Longson, T.
2008-11-01
Development of astronomy is based on well defined, watershed moments when an individual or a group of individuals make a discovery or a measurement that expand, and sometimes dramatically improve our knowledge of the Universe. The purpose of the Scientific Visualization project at Cal State Los Angeles is to bring these moments to life with the use of computer animations, the medium of the 21st century that appeals to the generations which grew up in Internet age. Our first story describes Immanuel Kant's remarkable the Island Universes hypothesis. Using elementary principles of then new Newtonian mechanics, Kant made bold and ultimately correct interpretation of the Milky Way and the objects that we now call galaxies.
Computer Animations as Astronomy Educational Tool: Immanuel Kant and The Island Universes Hypothesis
NASA Astrophysics Data System (ADS)
Mijic, Milan; Park, D.; Zumaeta, J.; Dong, H.; Simonian, V.; Levitin, S.; Sullivan, A.; Kang, E. Y. E.; Longson, T.; State LA SciVi Project, Cal
2008-05-01
Development of astronomy is based on well defined, watershed moments when an individual or a group of individuals make a discovery or a measurement that expand, and sometimes dramatically improve our knowledge of the Universe. The purpose of the Scientific Visualization project at Cal State LA is to bring these moments to life with the use of computer animations, the medium of the 21st century that appeals to the generations which grew up in Internet age. Our first story describes Immanuel Kant's remarkable the Island Universes hypothesis. Using elementary principles of then new Newtonian mechanics, Kant made bold and ultimately correct interpretation of the Milky Way and the objects that we now call galaxies
Mining Hierarchies and Similarity Clusters from Value Set Repositories.
Peterson, Kevin J; Jiang, Guoqian; Brue, Scott M; Shen, Feichen; Liu, Hongfang
2017-01-01
A value set is a collection of permissible values used to describe a specific conceptual domain for a given purpose. By helping to establish a shared semantic understanding across use cases, these artifacts are important enablers of interoperability and data standardization. As the size of repositories cataloging these value sets expand, knowledge management challenges become more pronounced. Specifically, discovering value sets applicable to a given use case may be challenging in a large repository. In this study, we describe methods to extract implicit relationships between value sets, and utilize these relationships to overlay organizational structure onto value set repositories. We successfully extract two different structurings, hierarchy and clustering, and show how tooling can leverage these structures to enable more effective value set discovery.
A survey of automated methods for sensemaking support
NASA Astrophysics Data System (ADS)
Llinas, James
2014-05-01
Complex, dynamic problems in general present a challenge for the design of analysis support systems and tools largely because there is limited reliable a priori procedural knowledge descriptive of the dynamic processes in the environment. Problem domains that are non-cooperative or adversarial impute added difficulties involving suboptimal observational data and/or data containing the effects of deception or covertness. The fundamental nature of analysis in these environments is based on composite approaches involving mining or foraging over the evidence, discovery and learning processes, and the synthesis of fragmented hypotheses; together, these can be labeled as sensemaking procedures. This paper reviews and analyzes the features, benefits, and limitations of a variety of automated techniques that offer possible support to sensemaking processes in these problem domains.
ERIC Educational Resources Information Center
Guajardo, Richard; Brett, Kelsey; Young, Frederick
2017-01-01
For the past several years academic libraries have been adopting discovery systems to provide a search experience that reflects user expectations and improves access to electronic resources. University of Houston Libraries has kept pace with this evolving trend by pursuing various discovery options; these include an open-source tool, a federated…
Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery.
Karthikeyan, Muthukumarasamy; Vyas, Renu
2015-01-01
Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.
Story Telling With Storyboards: Enhancements and Experiences
NASA Astrophysics Data System (ADS)
King, T. A.; Grayzeck, E. J.; Galica, C.; Erickson, K. J.
2016-12-01
A year ago a tool to help tell stories, called the Planetary Data Storyboard, was introduced. This tool is designed to use today's technologies to tell stories that are rich multi-media experiences, blending text, animations, movies and infographics. The Storyboard tool presents a set of panels that contain representative images of an event with associated notes or instructions. The panels are arranged in a timeline that allow a user to experience a discovery or event in the same way it occurred. Each panel can link to a more detailed source such as a publication, the data that was collected or items derived from the research (like movies or animations). A storyboard can be used to make science discovery more accessible to people by presenting events in an easy to follow layout. A storyboard can also help to teach the scientific method, by following the experiences of a researcher as they investigate a phenomenon or try to understand a new set of observations. We present the new features of Storyboard tool and show example stories for scientific discoveries.
Multi-Stage Hybrid Rocket Conceptual Design for Micro-Satellites Launch using Genetic Algorithm
NASA Astrophysics Data System (ADS)
Kitagawa, Yosuke; Kitagawa, Koki; Nakamiya, Masaki; Kanazaki, Masahiro; Shimada, Toru
The multi-objective genetic algorithm (MOGA) is applied to the multi-disciplinary conceptual design problem for a three-stage launch vehicle (LV) with a hybrid rocket engine (HRE). MOGA is an optimization tool used for multi-objective problems. The parallel coordinate plot (PCP), which is a data mining method, is employed in the post-process in MOGA for design knowledge discovery. A rocket that can deliver observing micro-satellites to the sun-synchronous orbit (SSO) is designed. It consists of an oxidizer tank containing liquid oxidizer, a combustion chamber containing solid fuel, a pressurizing tank and a nozzle. The objective functions considered in this study are to minimize the total mass of the rocket and to maximize the ratio of the payload mass to the total mass. To calculate the thrust and the engine size, the regression rate is estimated based on an empirical model for a paraffin (FT-0070) propellant. Several non-dominated solutions are obtained using MOGA, and design knowledge is discovered for the present hybrid rocket design problem using a PCP analysis. As a result, substantial knowledge on the design of an LV with an HRE is obtained for use in space transportation.
A benchmark study of scoring methods for non-coding mutations.
Drubay, Damien; Gautheret, Daniel; Michiels, Stefan
2018-05-15
Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
To ontologise or not to ontologise: An information model for a geospatial knowledge infrastructure
NASA Astrophysics Data System (ADS)
Stock, Kristin; Stojanovic, Tim; Reitsma, Femke; Ou, Yang; Bishr, Mohamed; Ortmann, Jens; Robertson, Anne
2012-08-01
A geospatial knowledge infrastructure consists of a set of interoperable components, including software, information, hardware, procedures and standards, that work together to support advanced discovery and creation of geoscientific resources, including publications, data sets and web services. The focus of the work presented is the development of such an infrastructure for resource discovery. Advanced resource discovery is intended to support scientists in finding resources that meet their needs, and focuses on representing the semantic details of the scientific resources, including the detailed aspects of the science that led to the resource being created. This paper describes an information model for a geospatial knowledge infrastructure that uses ontologies to represent these semantic details, including knowledge about domain concepts, the scientific elements of the resource (analysis methods, theories and scientific processes) and web services. This semantic information can be used to enable more intelligent search over scientific resources, and to support new ways to infer and visualise scientific knowledge. The work describes the requirements for semantic support of a knowledge infrastructure, and analyses the different options for information storage based on the twin goals of semantic richness and syntactic interoperability to allow communication between different infrastructures. Such interoperability is achieved by the use of open standards, and the architecture of the knowledge infrastructure adopts such standards, particularly from the geospatial community. The paper then describes an information model that uses a range of different types of ontologies, explaining those ontologies and their content. The information model was successfully implemented in a working geospatial knowledge infrastructure, but the evaluation identified some issues in creating the ontologies.
Kang, Lifeng; Chung, Bong Geun; Langer, Robert; Khademhosseini, Ali
2009-01-01
Microfluidic technologies’ ability to miniaturize assays and increase experimental throughput have generated significant interest in the drug discovery and development domain. These characteristics make microfluidic systems a potentially valuable tool for many drug discovery and development applications. Here, we review the recent advances of microfluidic devices for drug discovery and development and highlight their applications in different stages of the process, including target selection, lead identification, preclinical tests, clinical trials, chemical synthesis, formulations studies, and product management. PMID:18190858
NASA SMD and DPS Resources for Higher Education Faculty
NASA Astrophysics Data System (ADS)
Buxner, Sanlyn; Grier, Jennifer; Meinke, Bonnie; Schneider, Nick; Low, Rusty; Schultz, Greg; Manning, James; Fraknoi, Andrew; Gross, Nicholas
2015-11-01
The NASA Education and Public Outreach Forums have developed and provided resources for higher education for the past six years through a cooperative agreement with NASA’s Science Mission Directorate. Collaborations with science organizations, including AAS’s Division of Planetary Sciences, have resulted in more tools, professional training opportunities, and dissemination of resources for teaching in the undergraduate classroom. Resources have been developed through needs assessments of the community and with input from scientists and undergraduate instructors. All resources are freely available.NASA Wavelength (nasawavelength.org) is a collection of digital peer reviewed Earth and space science resources for formal and informal educators of all levels. All resources were developed through funding of the NASA Science Mission Directorate and have undergone a peer-review process through which educators and scientists ensure the content is accurate and useful in an educational setting. Within NASA Wavelength are specific lists of activities and resources for higher education faculty. Additionally, several resources have been developed for introductory college classrooms. The DPS Discovery slide sets are 3-slide presentations that can be incorporated into college lectures to keep classes apprised of the fast moving field of planetary science (http://dps.aas.org/education/dpsdisc). The “Astro 101 slide sets”, developed by the Astro Forum, are presentations 5-7 slides in length on a new development or discovery from a NASA Astrophysics mission relevant to topics in introductory astronomy courses of discoveries not yet in textbooks. Additional resources guides are available for Astro 101 courses and include cosmology and exoplanets. (https://www.astrosociety.org/education/resources-for-the-higher-education-audience/).Professional development opportunities are available to faculty to increase content knowledge and pedagogical tools. These include workshops at scientific meetings and online webinars that are archived for later viewing. For more information, visit the SMD E/PO community workspace at http://smdepo.org.
Search Engines for Tomorrow's Scholars
ERIC Educational Resources Information Center
Fagan, Jody Condit
2011-01-01
Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…
Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi
2014-01-01
With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
Great Originals of Modern Physics
ERIC Educational Resources Information Center
Decker, Fred W.
1972-01-01
European travel can provide an intimate view of the implements and locales of great discoveries in physics for the knowledgeable traveler. The four museums at Cambridge, London, Remscheid-Lennep, and Munich display a full range of discovery apparatus in modern physics as outlined here. (Author/TS)
ERIC Educational Resources Information Center
MacKenzie, Marion
1983-01-01
Scientific research leading to the discovery of female plants of the red alga Palmaria plamata (dulse) is described. This discovery has not only advanced knowledge of marine organisms and taxonomic relationships but also has practical implications. The complete life cycle of this organism is included. (JN)
43 CFR 4.1132 - Scope of discovery.
Code of Federal Regulations, 2014 CFR
2014-10-01
..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...
43 CFR 4.1132 - Scope of discovery.
Code of Federal Regulations, 2012 CFR
2012-10-01
..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...
43 CFR 4.1132 - Scope of discovery.
Code of Federal Regulations, 2013 CFR
2013-10-01
..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...
43 CFR 4.1132 - Scope of discovery.
Code of Federal Regulations, 2011 CFR
2011-10-01
..., the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature, custody... persons having knowledge of any discoverable matter. (b) It is not ground for objection that information...
Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri
2018-04-20
Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The web and database server are available at http://spirpepapp.sbi.kmutt.ac.th . SpirPep, a web-based bioactive peptide discovery application, is an in silico-based tool with an overview of the results. The platform is a one-stop analysis and visualization facility; and offers advantages over the currently available tools. This tool may be useful for further bioactivity analysis and the quantitative discovery of desirable peptides.
Going Virtual… or Not: Development and Testing of a 3D Virtual Astronomy Environment
NASA Astrophysics Data System (ADS)
Ruzhitskaya, L.; Speck, A.; Ding, N.; Baldridge, S.; Witzig, S.; Laffey, J.
2013-04-01
We present our preliminary results of a pilot study of students' knowledge transfer of an astronomy concept into a new environment. We also share our discoveries on what aspects of a 3D environment students consider being motivational and discouraging for their learning. This study was conducted among 64 non-science major students enrolled in an astronomy laboratory course. During the course, students learned the concept and applications of Kepler's laws using a 2D interactive environment. Later in the semester, the students were placed in a 3D environment in which they were asked to conduct observations and to answers a set of questions pertaining to the Kepler's laws of planetary motion. In this study, we were interested in observing scrutinizing and assessing students' behavior: from choices that they made while creating their avatars (virtual representations) to tools they choose to use, to their navigational patterns, to their levels of discourse in the environment. These helped us to identify what features of the 3D environment our participants found to be helpful and interesting and what tools created unnecessary clutter and distraction. The students' social behavior patterns in the virtual environment together with their answers to the questions helped us to determine how well they understood Kepler's laws, how well they could transfer the concepts to a new situation, and at what point a motivational tool such as a 3D environment becomes a disruption to the constructive learning. Our founding confirmed that students construct deeper knowledge of a concept when they are fully immersed in the environment.
NASA Reverb: Standards-Driven Earth Science Data and Service Discovery
NASA Astrophysics Data System (ADS)
Cechini, M. F.; Mitchell, A.; Pilone, D.
2011-12-01
NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard. After a yearlong design, development, and testing process, the ECHO team successfully released "Reverb - The Next Generation Earth Science Discovery Tool." Reverb relies heavily on the information contained in dataset and granule metadata, such as ISO 19115, to provide a dynamic experience to users based on identified search facet values extracted from science metadata. Such an approach allows users to perform cross-dataset correlation and searches, discovering additional data that they may not previously have been aware of. In addition to data discovery, Reverb users may discover services associated with their data of interest. When services utilize supported standards and/or protocols, Reverb can facilitate the invocation of both synchronous and asynchronous data processing services. This greatly enhances a users ability to discover data of interest and accomplish their research goals. Extrapolating on the current movement towards interoperable standards and an increase in available services, data service invocation and chaining will become a natural part of data discovery. Reverb is one example of a discovery tool that provides a mechanism for transforming the earth science data discovery paradigm.
Aptamer-Based Multiplexed Proteomic Technology for Biomarker Discovery
Gold, Larry; Ayers, Deborah; Bertino, Jennifer; Bock, Christopher; Bock, Ashley; Brody, Edward N.; Carter, Jeff; Dalby, Andrew B.; Eaton, Bruce E.; Fitzwater, Tim; Flather, Dylan; Forbes, Ashley; Foreman, Trudi; Fowler, Cate; Gawande, Bharat; Goss, Meredith; Gunn, Magda; Gupta, Shashi; Halladay, Dennis; Heil, Jim; Heilig, Joe; Hicke, Brian; Husar, Gregory; Janjic, Nebojsa; Jarvis, Thale; Jennings, Susan; Katilius, Evaldas; Keeney, Tracy R.; Kim, Nancy; Koch, Tad H.; Kraemer, Stephan; Kroiss, Luke; Le, Ngan; Levine, Daniel; Lindsey, Wes; Lollo, Bridget; Mayfield, Wes; Mehan, Mike; Mehler, Robert; Nelson, Sally K.; Nelson, Michele; Nieuwlandt, Dan; Nikrad, Malti; Ochsner, Urs; Ostroff, Rachel M.; Otis, Matt; Parker, Thomas; Pietrasiewicz, Steve; Resnicow, Daniel I.; Rohloff, John; Sanders, Glenn; Sattin, Sarah; Schneider, Daniel; Singer, Britta; Stanton, Martin; Sterkel, Alana; Stewart, Alex; Stratford, Suzanne; Vaught, Jonathan D.; Vrkljan, Mike; Walker, Jeffrey J.; Watrobka, Mike; Waugh, Sheela; Weiss, Allison; Wilcox, Sheri K.; Wolfson, Alexey; Wolk, Steven K.; Zhang, Chi; Zichi, Dom
2010-01-01
Background The interrogation of proteomes (“proteomics”) in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine. Methodology/Principal Findings We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma). Our current assay measures 813 proteins with low limits of detection (1 pM median), 7 logs of overall dynamic range (∼100 fM–1 µM), and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states. Conclusions/Significance We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine. PMID:21165148
Aptamer-based multiplexed proteomic technology for biomarker discovery.
Gold, Larry; Ayers, Deborah; Bertino, Jennifer; Bock, Christopher; Bock, Ashley; Brody, Edward N; Carter, Jeff; Dalby, Andrew B; Eaton, Bruce E; Fitzwater, Tim; Flather, Dylan; Forbes, Ashley; Foreman, Trudi; Fowler, Cate; Gawande, Bharat; Goss, Meredith; Gunn, Magda; Gupta, Shashi; Halladay, Dennis; Heil, Jim; Heilig, Joe; Hicke, Brian; Husar, Gregory; Janjic, Nebojsa; Jarvis, Thale; Jennings, Susan; Katilius, Evaldas; Keeney, Tracy R; Kim, Nancy; Koch, Tad H; Kraemer, Stephan; Kroiss, Luke; Le, Ngan; Levine, Daniel; Lindsey, Wes; Lollo, Bridget; Mayfield, Wes; Mehan, Mike; Mehler, Robert; Nelson, Sally K; Nelson, Michele; Nieuwlandt, Dan; Nikrad, Malti; Ochsner, Urs; Ostroff, Rachel M; Otis, Matt; Parker, Thomas; Pietrasiewicz, Steve; Resnicow, Daniel I; Rohloff, John; Sanders, Glenn; Sattin, Sarah; Schneider, Daniel; Singer, Britta; Stanton, Martin; Sterkel, Alana; Stewart, Alex; Stratford, Suzanne; Vaught, Jonathan D; Vrkljan, Mike; Walker, Jeffrey J; Watrobka, Mike; Waugh, Sheela; Weiss, Allison; Wilcox, Sheri K; Wolfson, Alexey; Wolk, Steven K; Zhang, Chi; Zichi, Dom
2010-12-07
The interrogation of proteomes ("proteomics") in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine. We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma). Our current assay measures 813 proteins with low limits of detection (1 pM median), 7 logs of overall dynamic range (~100 fM-1 µM), and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states. We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.
Realising the knowledge spiral in healthcare: the role of data mining and knowledge management.
Wickramasinghe, Nilmini; Bali, Rajeev K; Gibbons, M Chris; Schaffer, Jonathan
2008-01-01
Knowledge Management (KM) is an emerging business approach aimed at solving current problems such as competitiveness and the need to innovate which are faced by businesses today. The premise for the need for KM is based on a paradigm shift in the business environment where knowledge is central to organizational performance . Organizations trying to embrace KM have many tools, techniques and strategies at their disposal. A vital technique in KM is data mining which enables critical knowledge to be gained from the analysis of large amounts of data and information. The healthcare industry is a very information rich industry. The collecting of data and information permeate most, if not all areas of this industry; however, the healthcare industry has yet to fully embrace KM, let alone the new evolving techniques of data mining. In this paper, we demonstrate the ubiquitous benefits of data mining and KM to healthcare by highlighting their potential to enable and facilitate superior clinical practice and administrative management to ensue. Specifically, we show how data mining can realize the knowledge spiral by effecting the four key transformations identified by Nonaka of turning: (1) existing explicit knowledge to new explicit knowledge, (2) existing explicit knowledge to new tacit knowledge, (3) existing tacit knowledge to new explicit knowledge and (4) existing tacit knowledge to new tacit knowledge. This is done through the establishment of theoretical models that respectively identify the function of the knowledge spiral and the powers of data mining, both exploratory and predictive, in the knowledge discovery process. Our models are then applied to a healthcare data set to demonstrate the potential of this approach as well as the implications of such an approach to the clinical and administrative aspects of healthcare. Further, we demonstrate how these techniques can facilitate hospitals to address the six healthcare quality dimensions identified by the Committee for Quality Healthcare.
Optogenetic Approaches to Drug Discovery in Neuroscience and Beyond.
Zhang, Hongkang; Cohen, Adam E
2017-07-01
Recent advances in optogenetics have opened new routes to drug discovery, particularly in neuroscience. Physiological cellular assays probe functional phenotypes that connect genomic data to patient health. Optogenetic tools, in particular tools for all-optical electrophysiology, now provide a means to probe cellular disease models with unprecedented throughput and information content. These techniques promise to identify functional phenotypes associated with disease states and to identify compounds that improve cellular function regardless of whether the compound acts directly on a target or through a bypass mechanism. This review discusses opportunities and unresolved challenges in applying optogenetic techniques throughout the discovery pipeline - from target identification and validation, to target-based and phenotypic screens, to clinical trials. Copyright © 2017 Elsevier Ltd. All rights reserved.
From Information Center to Discovery System: Next Step for Libraries?
ERIC Educational Resources Information Center
Marcum, James W.
2001-01-01
Proposes a discovery system model to guide technology integration in academic libraries that fuses organizational learning, systems learning, and knowledge creation techniques with constructivist learning practices to suggest possible future directions for digital libraries. Topics include accessing visual and continuous media; information…
Foreword to "The Secret of Childhood."
ERIC Educational Resources Information Center
Stephenson, Margaret E.
2000-01-01
Discusses the basic discoveries of Montessori's Casa dei Bambini. Considers principles of Montessori's organizing theory: the absorbent mind, the unfolding nature of life, the spiritual embryo, self-construction, acquisition of culture, creativity of life, repetition of exercise, freedom within limits, children's discovery of knowledge, the secret…
NASA Astrophysics Data System (ADS)
Harwit, Martin
1984-04-01
In the remarkable opening section of this book, a well-known Cornell astronomer gives precise thumbnail histories of the 43 basic cosmic discoveries - stars, planets, novae, pulsars, comets, gamma-ray bursts, and the like - that form the core of our knowledge of the universe. Many of them, he points out, were made accidentally and outside the mainstream of astronomical research and funding. This observation leads him to speculate on how many more major phenomena there might be and how they might be most effectively sought out in afield now dominated by large instruments and complex investigative modes and observational conditions. The book also examines discovery in terms of its political, financial, and sociological context - the role of new technologies and of industry and the military in revealing new knowledge; and methods of funding, of peer review, and of allotting time on our largest telescopes. It concludes with specific recommendations for organizing astronomy in ways that will best lead to the discovery of the many - at least sixty - phenomena that Harwit estimates are still waiting to be found.
The discovery of medicines for rare diseases
Swinney, David C; Xia, Shuangluo
2015-01-01
There is a pressing need for new medicines (new molecular entities; NMEs) for rare diseases as few of the 6800 rare diseases (according to the NIH) have approved treatments. Drug discovery strategies for the 102 orphan NMEs approved by the US FDA between 1999 and 2012 were analyzed to learn from past success: 46 NMEs were first in class; 51 were followers; and five were imaging agents. First-in-class medicines were discovered with phenotypic assays (15), target-based approaches (12) and biologic strategies (18). Identification of genetic causes in areas with more basic and translational research such as cancer and in-born errors in metabolism contributed to success regardless of discovery strategy. In conclusion, greater knowledge increases the chance of success and empirical solutions can be effective when knowledge is incomplete. PMID:25068983
The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery
2014-01-01
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org. PMID:24602174
Antanaviciute, Agne; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Carr, Ian M
2015-12-01
Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype.We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp. Supplementary data are available at Bioinformatics online. umaan@leeds.ac.uk. © The Author 2015. Published by Oxford University Press.
Computational tools for comparative phenomics; the role and promise of ontologies
Gkoutos, Georgios V.; Schofield, Paul N.; Hoehndorf, Robert
2012-01-01
A major aim of the biological sciences is to gain an understanding of human physiology and disease. One important step towards such a goal is the discovery of the function of genes that will lead to better understanding of the physiology and pathophysiology of organisms ultimately providing better understanding, diagnosis, and therapy. Our increasing ability to phenotypically characterise genetic variants of model organisms coupled with systematic and hypothesis-driven mutagenesis is resulting in a wealth of information that could potentially provide insight to the functions of all genes in an organism. The challenge we are now facing is to develop computational methods that can integrate and analyse such data. The introduction of formal ontologies that make their semantics explicit and accessible to automated reasoning promises the tantalizing possibility of standardizing biomedical knowledge allowing for novel, powerful queries that bridge multiple domains, disciplines, species and levels of granularity. We review recent computational approaches that facilitate the integration of experimental data from model organisms with clinical observations in humans. These methods foster novel cross species analysis approaches, thereby enabling comparative phenomics and leading to the potential of translating basic discoveries from the model systems into diagnostic and therapeutic advances at the clinical level. PMID:22814867
Pseudotargeted MS Method for the Sensitive Analysis of Protein Phosphorylation in Protein Complexes.
Lyu, Jiawen; Wang, Yan; Mao, Jiawei; Yao, Yating; Wang, Shujuan; Zheng, Yong; Ye, Mingliang
2018-05-15
In this study, we presented an enrichment-free approach for the sensitive analysis of protein phosphorylation in minute amounts of samples, such as purified protein complexes. This method takes advantage of the high sensitivity of parallel reaction monitoring (PRM). Specifically, low confident phosphopeptides identified from the data-dependent acquisition (DDA) data set were used to build a pseudotargeted list for PRM analysis to allow the identification of additional phosphopeptides with high confidence. The development of this targeted approach is very easy as the same sample and the same LC-system were used for the discovery and the targeted analysis phases. No sample fractionation or enrichment was required for the discovery phase which allowed this method to analyze minute amount of sample. We applied this pseudotargeted MS method to quantitatively examine phosphopeptides in affinity purified endogenous Shc1 protein complexes at four temporal stages of EGF signaling and identified 82 phospho-sites. To our knowledge, this is the highest number of phospho-sites identified from the protein complexes. This pseudotargeted MS method is highly sensitive in the identification of low abundance phosphopeptides and could be a powerful tool to study phosphorylation-regulated assembly of protein complex.
Anniversary of the discovery/isolation of the yeast centromere by Clarke and Carbon.
Bloom, Kerry
2015-05-01
The first centromere was isolated 35 years ago by Louise Clarke and John Carbon from budding yeast. They embarked on their journey with rudimentary molecular tools (by today's standards) and little knowledge of the structure of a chromosome, much less the nature of a centromere. Their discovery opened up a new field, as centromeres have now been isolated from fungi and numerous plants and animals, including mammals. Budding yeast and several other fungi have small centromeres with short, well-defined sequences, known as point centromeres, whereas regional centromeres span several kilobases up to megabases and do not seem to have DNA sequence specificity. Centromeres are at the heart of artificial chromosomes, and we have seen the birth of synthetic centromeres in budding and fission yeast and mammals. The diversity in centromeres throughout phylogeny belie conserved functions that are only beginning to be understood. © 2015 Bloom. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Quantitative mass spectrometry: an overview
NASA Astrophysics Data System (ADS)
Urban, Pawel L.
2016-10-01
Mass spectrometry (MS) is a mainstream chemical analysis technique in the twenty-first century. It has contributed to numerous discoveries in chemistry, physics and biochemistry. Hundreds of research laboratories scattered all over the world use MS every day to investigate fundamental phenomena on the molecular level. MS is also widely used by industry-especially in drug discovery, quality control and food safety protocols. In some cases, mass spectrometers are indispensable and irreplaceable by any other metrological tools. The uniqueness of MS is due to the fact that it enables direct identification of molecules based on the mass-to-charge ratios as well as fragmentation patterns. Thus, for several decades now, MS has been used in qualitative chemical analysis. To address the pressing need for quantitative molecular measurements, a number of laboratories focused on technological and methodological improvements that could render MS a fully quantitative metrological platform. In this theme issue, the experts working for some of those laboratories share their knowledge and enthusiasm about quantitative MS. I hope this theme issue will benefit readers, and foster fundamental and applied research based on quantitative MS measurements. This article is part of the themed issue 'Quantitative mass spectrometry'.
Heterogeniety and Heterarchy: How far can network analyses in Earth and space sciences?
NASA Astrophysics Data System (ADS)
Prabhu, A.; Fox, P. A.; Eleish, A.; Li, C.; Pan, F.; Zhong, H.
2017-12-01
The vast majority of explorations of Earth systems are limited in their ability to effectively explore the most important (often most difficult) problems because they are forced to interconnect at the data-element, or syntactic, level rather than at a higher scientific, or conceptual/ semantic, level. Recent successes in the application of complex network theory and algorithms to minerology, fossils and proteins over billions of years of Earth's history, raise expectations that more general graph-based approaches offer the opportunity for new discoveries = needles instead of haystacks. In the past 10 years in the natural sciences there has substantial progress in providing both specialists and non-specialists the ability to describe in machine readable form, geophysical quantities and relations among them in meaningful and natural ways, effectively breaking the prior syntax barrier. The corresponding open-world semantics and reasoning provide higher-level interconnections. That is, semantics provided around the data structures, using open-source tools, allow for discovery at the knowledge level. This presentation will cover the fundamentals of data-rich network analyses for geosciences, provide illustrative examples in mineral evolution and offer future paths for consideration.
Knowledge discovery from structured mammography reports using inductive logic programming.
Burnside, Elizabeth S; Davis, Jesse; Costa, Victor Santos; Dutra, Inês de Castro; Kahn, Charles E; Fine, Jason; Page, David
2005-01-01
The development of large mammography databases provides an opportunity for knowledge discovery and data mining techniques to recognize patterns not previously appreciated. Using a database from a breast imaging practice containing patient risk factors, imaging findings, and biopsy results, we tested whether inductive logic programming (ILP) could discover interesting hypotheses that could subsequently be tested and validated. The ILP algorithm discovered two hypotheses from the data that were 1) judged as interesting by a subspecialty trained mammographer and 2) validated by analysis of the data itself.
A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.
Kothari, Cartik R; Payne, Philip R O
2015-01-01
In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.
Parikh, Priti P; Minning, Todd A; Nguyen, Vinh; Lalithsena, Sarasi; Asiaee, Amir H; Sahoo, Satya S; Doshi, Prashant; Tarleton, Rick; Sheth, Amit P
2012-01-01
Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.
The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space.
Nicolaou, Christos A; Watson, Ian A; Hu, Hong; Wang, Jibo
2016-07-25
Venturing into the immensity of the small molecule universe to identify novel chemical structure is a much discussed objective of many methods proposed by the chemoinformatics community. To this end, numerous approaches using techniques from the fields of computational de novo design, virtual screening and reaction informatics, among others, have been proposed. Although in principle this objective is commendable, in practice there are several obstacles to useful exploitation of the chemical space. Prime among them are the sheer number of theoretically feasible compounds and the practical concern regarding the synthesizability of the chemical structures conceived using in silico methods. We present the Proximal Lilly Collection initiative implemented at Eli Lilly and Co. with the aims to (i) define the chemical space of small, drug-like compounds that could be synthesized using in-house resources and (ii) facilitate access to compounds in this large space for the purposes of ongoing drug discovery efforts. The implementation of PLC relies on coupling access to available synthetic knowledge and resources with chemo/reaction informatics techniques and tools developed for this purpose. We describe in detail the computational framework supporting this initiative and elaborate on the characteristics of the PLC virtual collection of compounds. As an example of the opportunities provided to drug discovery researchers by easy access to a large, realistically feasible virtual collection such as the PLC, we describe a recent application of the technology that led to the discovery of selective kinase inhibitors.
ERIC Educational Resources Information Center
Kraft, Donald H., Ed.
The 2000 ASIS (American Society for Information Science) conference explored knowledge innovation. The tracks in the conference program included knowledge discovery, capture, and creation; classification and representation; information retrieval; knowledge dissemination; and social, behavioral, ethical, and legal aspects. This proceedings is…
Evaluating the Science of Discovery in Complex Health Systems
ERIC Educational Resources Information Center
Norman, Cameron D.; Best, Allan; Mortimer, Sharon; Huerta, Timothy; Buchan, Alison
2011-01-01
Complex health problems such as chronic disease or pandemics require knowledge that transcends disciplinary boundaries to generate solutions. Such transdisciplinary discovery requires researchers to work and collaborate across boundaries, combining elements of basic and applied science. At the same time, calls for more interdisciplinary health…
29 CFR 18.14 - Scope of discovery.
Code of Federal Regulations, 2014 CFR
2014-07-01
... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...
49 CFR 386.38 - Scope of discovery.
Code of Federal Regulations, 2011 CFR
2011-10-01
... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...
49 CFR 386.38 - Scope of discovery.
Code of Federal Regulations, 2012 CFR
2012-10-01
... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...
29 CFR 18.14 - Scope of discovery.
Code of Federal Regulations, 2012 CFR
2012-07-01
... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...
49 CFR 386.38 - Scope of discovery.
Code of Federal Regulations, 2013 CFR
2013-10-01
... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...
29 CFR 18.14 - Scope of discovery.
Code of Federal Regulations, 2011 CFR
2011-07-01
... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...
29 CFR 18.14 - Scope of discovery.
Code of Federal Regulations, 2013 CFR
2013-07-01
... administrative law judge in accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the... things and the identity and location of persons having knowledge of any discoverable matter. (b) It is...
49 CFR 386.38 - Scope of discovery.
Code of Federal Regulations, 2014 CFR
2014-10-01
... accordance with these rules, the parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the proceeding, including the existence, description, nature... location of persons having knowledge of any discoverable matter. (b) It is not ground for objection that...
NASA Astrophysics Data System (ADS)
McGibbney, L. J.; Jiang, Y.; Burgess, A. B.
2017-12-01
Big Earth observation data have been produced, archived and made available online, but discovering the right data in a manner that precisely and efficiently satisfies user needs presents a significant challenge to the Earth Science (ES) community. An emerging trend in information retrieval community is to utilize knowledge graphs to assist users in quickly finding desired information from across knowledge sources. This is particularly prevalent within the fields of social media and complex multimodal information processing to name but a few, however building a domain-specific knowledge graph is labour-intensive and hard to keep up-to-date. In this work, we update our progress on the Earth Science Knowledge Graph (ESKG) project; an ESIP-funded testbed project which provides an automatic approach to building a dynamic knowledge graph for ES to improve interdisciplinary data discovery by leveraging implicit, latent existing knowledge present within across several U.S Federal Agencies e.g. NASA, NOAA and USGS. ESKG strengthens ties between observations and user communities by: 1) developing a knowledge graph derived from various sources e.g. Web pages, Web Services, etc. via natural language processing and knowledge extraction techniques; 2) allowing users to traverse, explore, query, reason and navigate ES data via knowledge graph interaction. ESKG has the potential to revolutionize the way in which ES communities interact with ES data in the open world through the entity, spatial and temporal linkages and characteristics that make it up. This project enables the advancement of ESIP collaboration areas including both Discovery and Semantic Technologies by putting graph information right at our fingertips in an interactive, modern manner and reducing the efforts to constructing ontology. To demonstrate the ESKG concept, we will demonstrate use of our framework across NASA JPL's PO.DAAC, NOAA's Earth Observation Requirements Evaluation System (EORES) and various USGS systems.
Trends in Modern Drug Discovery.
Eder, Jörg; Herrling, Paul L
2016-01-01
Drugs discovered by the pharmaceutical industry over the past 100 years have dramatically changed the practice of medicine and impacted on many aspects of our culture. For many years, drug discovery was a target- and mechanism-agnostic approach that was based on ethnobotanical knowledge often fueled by serendipity. With the advent of modern molecular biology methods and based on knowledge of the human genome, drug discovery has now largely changed into a hypothesis-driven target-based approach, a development which was paralleled by significant environmental changes in the pharmaceutical industry. Laboratories became increasingly computerized and automated, and geographically dispersed research sites are now more and more clustered into large centers to capture technological and biological synergies. Today, academia, the regulatory agencies, and the pharmaceutical industry all contribute to drug discovery, and, in order to translate the basic science into new medical treatments for unmet medical needs, pharmaceutical companies have to have a critical mass of excellent scientists working in many therapeutic fields, disciplines, and technologies. The imperative for the pharmaceutical industry to discover breakthrough medicines is matched by the increasing numbers of first-in-class drugs approved in recent years and reflects the impact of modern drug discovery approaches, technologies, and genomics.
Perspectives on NMR in drug discovery: a technique comes of age
Pellecchia, Maurizio; Bertini, Ivano; Cowburn, David; Dalvit, Claudio; Giralt, Ernest; Jahnke, Wolfgang; James, Thomas L.; Homans, Steve W.; Kessler, Horst; Luchinat, Claudio; Meyer, Bernd; Oschkinat, Hartmut; Peng, Jeff; Schwalbe, Harald; Siegal, Gregg
2009-01-01
In the past decade, the potential of harnessing the ability of nuclear magnetic resonance (NMR) spectroscopy to monitor intermolecular interactions as a tool for drug discovery has been increasingly appreciated in academia and industry. In this Perspective, we highlight some of the major applications of NMR in drug discovery, focusing on hit and lead generation, and provide a critical analysis of its current and potential utility. PMID:19172689
SPARTAN-201 satellite begins separation from Shuttle Discovery
1994-09-12
STS064-111-041 (12 Sept. 1994) ---- Backdropped against New England's coast, the Shuttle Pointed Autonomous Research Tool for Astronomy (SPARTAN-201) satellite begins its separation from the space shuttle Discovery. The free-flying spacecraft, 130 nautical miles above Cape Cod at frame center, remained some 40 miles away from Discovery until the crew retrieved it two days later. Photo credit: NASA or National Aeronautics and Space Administration
Target assessment for antiparasitic drug discovery
Frearson, Julie A.; Wyatt, Paul G.; Gilbert, Ian H.; Fairlamb, Alan H.
2010-01-01
Drug discovery is a high-risk, expensive and lengthy process taking at least 12 years and costing upwards of US$500 million per drug to reach the clinic. For neglected diseases, the drug discovery process is driven by medical need and guided by pre-defined target product profiles. Assessment and prioritisation of the most promising targets for entry into screening programmes is crucial for maximising chances of success. Here we describe criteria used in our drug discovery unit for target assessment and introduce the ‘traffic light’ system as a prioritisation and management tool. We hope this brief review will stimulate basic scientists to acquire additional information necessary for drug discovery. PMID:17962072
mHealth Visual Discovery Dashboard.
Fang, Dezhi; Hohman, Fred; Polack, Peter; Sarker, Hillol; Kahng, Minsuk; Sharmin, Moushumi; al'Absi, Mustafa; Chau, Duen Horng
2017-09-01
We present Discovery Dashboard, a visual analytics system for exploring large volumes of time series data from mobile medical field studies. Discovery Dashboard offers interactive exploration tools and a data mining motif discovery algorithm to help researchers formulate hypotheses, discover trends and patterns, and ultimately gain a deeper understanding of their data. Discovery Dashboard emphasizes user freedom and flexibility during the data exploration process and enables researchers to do things previously challenging or impossible to do - in the web-browser and in real time. We demonstrate our system visualizing data from a mobile sensor study conducted at the University of Minnesota that included 52 participants who were trying to quit smoking.
mHealth Visual Discovery Dashboard
Fang, Dezhi; Hohman, Fred; Polack, Peter; Sarker, Hillol; Kahng, Minsuk; Sharmin, Moushumi; al'Absi, Mustafa; Chau, Duen Horng
2018-01-01
We present Discovery Dashboard, a visual analytics system for exploring large volumes of time series data from mobile medical field studies. Discovery Dashboard offers interactive exploration tools and a data mining motif discovery algorithm to help researchers formulate hypotheses, discover trends and patterns, and ultimately gain a deeper understanding of their data. Discovery Dashboard emphasizes user freedom and flexibility during the data exploration process and enables researchers to do things previously challenging or impossible to do — in the web-browser and in real time. We demonstrate our system visualizing data from a mobile sensor study conducted at the University of Minnesota that included 52 participants who were trying to quit smoking. PMID:29354812
CRISPR/Cas9: From Genome Engineering to Cancer Drug Discovery
Luo, Ji
2016-01-01
Advances in translational research are often driven by new technologies. The advent of microarrays, next-generation sequencing, proteomics and RNA interference (RNAi) have led to breakthroughs in our understanding of the mechanisms of cancer and the discovery of new cancer drug targets. The discovery of the bacterial clustered regularly interspaced palindromic repeat (CRISPR) system and its subsequent adaptation as a tool for mammalian genome engineering has opened up new avenues for functional genomics studies. This review will focus on the utility of CRISPR in the context of cancer drug target discovery. PMID:28603775
How to learn from patients: Fuller Albright's exploration of adrenal function.
Schwartz, T B
1995-08-01
Fuller Albright (1900-1969) was acknowledged as the preeminent clinical and investigative endocrinologist of his day by many of his contemporaries, but his many achievements are all but unknown to the present generation of physicians. This article describes how he used his clinical knowledge and a few tools--the measurement of urinary 17-ketosteroid excretion and the administration of methyltestosterone--to elucidate the major hormonal functions of the adrenal cortex and to clarify the pathophysiology of the Cushing syndrome. In addition, in a tour de force of clinical reasoning, he predicted, 5 years before the event, the discovery of a hormone that would reverse the endocrinologic abnormalities of congenital adrenal hyperplasia. Fittingly, he and pioneer pediatric endocrinologist Lawson Wilkins were the first to treat this disease successfully with cortisone.
A critical review of the arsenic uptake mechanisms and phytoremediation potential of Pteris vittata.
Danh, Luu Thai; Truong, Paul; Mammucari, Raffaella; Foster, Neil
2014-01-01
The discovery of the arsenic hyperaccumulator, Pteris vittata (Chinese brake fern), has contributed to the promotion of its application as a means of phytoremediation for arsenic removal from contaminated soils and water. Understanding the mechanisms involved in arsenic tolerance and accumulation of this plant provides valuable tools to improve the phytoremediation efficiency. In this review, the current knowledge about the physiological and molecular mechanisms of arsenic tolerance and accumulation in P. vittata is summarized, and an attempt has been made to clarify some of the unresolved questions related to these mechanisms. In addition, the capacity of P. vittata for remediation of arsenic-contaminated soils is evaluated under field conditions for the first time, and possible solutions to improve the remediation capacity of Pteris vittata are also discussed.
Extracting nursing practice patterns from structured labor and delivery data sets.
Hall, Eric S; Thornton, Sidney N
2007-10-11
This study was designed to demonstrate the feasibility of a computerized care process model that provides real-time case profiling and outcome forecasting. A methodology was defined for extracting nursing practice patterns from structured point-of-care data collected using the labor and delivery information system at Intermountain Healthcare. Data collected during January 2006 were retrieved from Intermountain Healthcare's enterprise data warehouse for use in the study. The knowledge discovery in databases process provided a framework for data analysis including data selection, preprocessing, data-mining, and evaluation. Development of an interactive data-mining tool and construction of a data model for stratification of patient records into profiles supported the goals of the study. Five benefits of the practice pattern extraction capability, which extend to other clinical domains, are listed with supporting examples.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi
2016-01-01
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
PRO-Elicere: A Study for Create a New Process of Dependability Analysis of Space Computer Systems
NASA Astrophysics Data System (ADS)
da Silva, Glauco; Netto Lahoz, Carlos Henrique
2013-09-01
This paper presents the new approach to the computer system dependability analysis, called PRO-ELICERE, which introduces data mining concepts and intelligent mechanisms to decision support to analyze the potential hazards and failures of a critical computer system. Also, are presented some techniques and tools that support the traditional dependability analysis and briefly discusses the concept of knowledge discovery and intelligent databases for critical computer systems. After that, introduces the PRO-ELICERE process, an intelligent approach to automate the ELICERE, a process created to extract non-functional requirements for critical computer systems. The PRO-ELICERE can be used in the V&V activities in the projects of Institute of Aeronautics and Space, such as the Brazilian Satellite Launcher (VLS-1).
Hepcidin modulation in human diseases: From research to clinic
Piperno, Alberto; Mariani, Raffaella; Trombini, Paola; Girelli, Domenico
2009-01-01
By modulating hepcidin production, an organism controls intestinal iron absorption, iron uptake and mobilization from stores to meet body iron need. In recent years there has been important advancement in our knowledge of hepcidin regulation that also has implications for understanding the physiopathology of some human disorders. Since the discovery of hepcidin and the demonstration of its pivotal role in iron homeostasis, there has been a substantial interest in developing a reliable assay of the hormone in biological fluids. Measurement of hepcidin in biological fluids can improve our understanding of iron diseases and be a useful tool for diagnosis and clinical management of these disorders. We reviewed the literature and our own research on hepcidin to give an updated status of the situation in this rapidly evolving field. PMID:19195055
The arrival of JAK inhibitors: advancing the treatment of immune and hematologic disorders
Furumoto, Yasuko; Gadina, Massimo
2013-01-01
Altered production of cytokines can result in pathologies ranging from autoimmune diseases to malignancies. The Janus Kinases family is a small group of receptor-associated signaling molecules that is essential to the signal cascade originating from type I and type II cytokine receptors. Inhibition of tyrosine kinases enzymatic activity using small molecules has recently become a powerful tool for treatment of several malignancies. Twenty years after the discovery of these enzymes, two inhibitors for this class of kinases have been approved for clinical use and others are currently in the final stage of development. Here we review the principles of cytokines signaling, we summarize our current knowledge of the approved inhibitors, and briefly introduce some of the inhibitors that are currently under development. PMID:23743669
Metagenomics of Thermophiles with a Focus on Discovery of Novel Thermozymes
DeCastro, María-Eugenia; Rodríguez-Belmonte, Esther; González-Siso, María-Isabel
2016-01-01
Microbial populations living in environments with temperatures above 50°C (thermophiles) have been widely studied, increasing our knowledge in the composition and function of these ecological communities. Since these populations express a broad number of heat-resistant enzymes (thermozymes), they also represent an important source for novel biocatalysts that can be potentially used in industrial processes. The integrated study of the whole-community DNA from an environment, known as metagenomics, coupled with the development of next generation sequencing (NGS) technologies, has allowed the generation of large amounts of data from thermophiles. In this review, we summarize the main approaches commonly utilized for assessing the taxonomic and functional diversity of thermophiles through metagenomics, including several bioinformatics tools and some metagenome-derived methods to isolate their thermozymes. PMID:27729905
Geoinformatics 2007: data to knowledge
Brady, Shailaja R.; Sinha, A. Krishna; Gundersen, Linda C.
2007-01-01
Geoinformatics is the term used to describe a variety of efforts to promote collaboration between the computer sciences and the geosciences to solve complex scientific questions. It refers to the distributed, integrated digital information system and working environment that provides innovative means for the study of the Earth systems, as well as other planets, through use of advanced information technologies. Geoinformatics activities range from major research and development efforts creating new technologies to provide high-quality, sustained production-level services for data discovery, integration and analysis, to small, discipline-specific efforts that develop earth science data collections and data analysis tools serving the needs of individual communities. The ultimate vision of Geoinformatics is a highly interconnected data system populated with high quality, freely available data, as well as, a robust set of software for analysis, visualization, and modeling.
Zhang, Wenming
2017-09-19
One of the greatest global challenges is to feed the ever-increasing world population. The agrochemical tools growers currently utilize are also under continuous pressure, due to a number of factors that contribute to the loss of existing products. Mesoionic pyrido[1,2-a]pyrimidinones are an unusual yet very intriguing class of compounds. Known for several decades, this class of compounds had not been systemically studied until we started our insecticide discovery program. This Account provides an overview of the efforts on mesoionic pyrido[1,2-a]pyridinone insecticide discovery, beginning from the initial high throughput screen (HTS) discovery to ultimate identification of triflumezopyrim (4, DuPont Pyraxalt) and dicloromezotiaz (5) for commercialization as novel insecticides. Mesoionic pyrido[1,2-a]pyrimidinones with a n-propyl group at the 1-position, such as compound 1, were initially isolated as undesired byproducts from reactions for a fungicide discovery program at DuPont Crop Protection. Such compounds showed interesting insecticidal activity in a follow-up screen and against an expanded insect species list. The area became an insecticide hit for exploration and then a lead area for optimization. At the lead optimization stage, variations at three regions of compound 1, i.e., side-chain (n-propyl group), substituents on the 3-phenyl group, and substitutions on the pyrido- moiety, were explored with many analogues prepared and evaluated. Breakthrough discoveries included replacing the n-propyl group with a 2,2,2-trifluoroethyl group to generate compound 2, and then with a 2-chlorothiazol-5-ylmethyl group to form compound 3. 3 possesses potent insecticidal activity not only against a group of hopper species, including corn planthopper (Peregrinus maidis (Ashmead), CPH) and potato leafhopper (Empoasca fabae (Harris), PLH), as well as two key rice hopper species, namely, brown planthopper (Nilaparvata lugens (Stål), BPH) and rice green leafhopper (Nephotettix virescens (Distant), GLH), but also against representative lepidoptera species Diamondback moth (Plutella xylostella (Linnaeus), DBM) and fall armyworm (Spodoptera frugiperda (J.E. Smith), FAW). Further optimization based on 3 led to discovery of triflumezopyrim (4), with a 5-pyrimidinylmethyl group, as a potent hopper insecticide for rice usage. Optimization of the substituents on the pyrido- moiety of 3 resulted in discovery of dicloromezotiaz (5) as a lepidoptera insecticide. In this Account, we present the discovery and optimization of mesoionic pyrido[1,2-a]pyrimidinone insecticides toward the identification of triflumezopyrim (4) and dicloromezotiaz (5). We hope that knowledge and lessons derived from this discovery program will provide valuable information for future agrochemical and drug discovery. Our successful discovery and commercialization development of two novel insecticides based on meosoionic pyrido[1,2-a]pyridiminones may also stimulate interests of scientists from other disciplines to adopt this uncommon yet intriguing heterocycle ring system in pharmaceutical and other material science discovery research.
Reuniting Virtue and Knowledge
ERIC Educational Resources Information Center
Culham, Tom
2015-01-01
Einstein held that intuition is more important than rational inquiry as a source of discovery. Further, he explicitly and implicitly linked the heart, the sacred, devotion and intuitive knowledge. The raison d'être of universities is the advance of knowledge; however, they have primarily focused on developing student's skills in working with…
Standardized plant disease evaluations will enhance resistance gene discovery
USDA-ARS?s Scientific Manuscript database
Gene discovery and marker development using DNA-based tools require plant populations with well documented phenotypes. If dissimilar phenotype evaluation methods or data scoring techniques are employed with different crops, or at different labs for the same crops, then data mining for genetic marker...
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update
Huynh, Tien; Rigoutsos, Isidore
2004-01-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification—directly from sequence—of structural deviations from α-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:15215340
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.
Huynh, Tien; Rigoutsos, Isidore
2004-07-01
In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification--directly from sequence--of structural deviations from alpha-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
Gomez, Gabriel; Adams, Leslie G.; Rice-Ficht, Allison; Ficht, Thomas A.
2013-01-01
Vaccination is the most important approach to counteract infectious diseases. Thus, the development of new and improved vaccines for existing, emerging, and re-emerging diseases is an area of great interest to the scientific community and general public. Traditional approaches to subunit antigen discovery and vaccine development lack consideration for the critical aspects of public safety and activation of relevant protective host immunity. The availability of genomic sequences for pathogenic Brucella spp. and their hosts have led to development of systems-wide analytical tools that have provided a better understanding of host and pathogen physiology while also beginning to unravel the intricacies at the host-pathogen interface. Advances in pathogen biology, host immunology, and host-agent interactions have the potential to serve as a platform for the design and implementation of better-targeted antigen discovery approaches. With emphasis on Brucella spp., we probe the biological aspects of host and pathogen that merit consideration in the targeted design of subunit antigen discovery and vaccine development. PMID:23720712
Integrative Systems Biology for Data Driven Knowledge Discovery
Greene, Casey S.; Troyanskaya, Olga G.
2015-01-01
Integrative systems biology is an approach that brings together diverse high throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost effective manner. Using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of yet unknown components of a system of interest and how its malfunction leads to disease. PMID:21044756
Automated Knowledge Discovery from Simulators
NASA Technical Reports Server (NTRS)
Burl, Michael C.; DeCoste, D.; Enke, B. L.; Mazzoni, D.; Merline, W. J.; Scharenbroich, L.
2006-01-01
In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/ parameter/model space that lead to a particular output behavior. Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study otherwise. However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more efficient knowledge extraction.
18 CFR 385.402 - Scope of discovery (Rule 402).
Code of Federal Regulations, 2010 CFR
2010-04-01
... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Scope of discovery (Rule 402). 385.402 Section 385.402 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... persons having any knowledge of any discoverable matter. It is not ground for objection that the...
Doors to Discovery[TM]. What Works Clearinghouse Intervention Report
ERIC Educational Resources Information Center
What Works Clearinghouse, 2013
2013-01-01
"Doors to Discovery"]TM] is a preschool literacy curriculum that uses eight thematic units of activities to help children build fundamental early literacy skills in oral language, phonological awareness, concepts of print, alphabet knowledge, writing, and comprehension. The eight thematic units cover topics such as nature, friendship,…
78 FR 12933 - Proceedings Before the Commodity Futures Trading Commission
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-26
... proceedings. These new amendments also provide that Judgment Officers may conduct sua sponte discovery in... discovery; (4) sound risk management practices; and (5) other public interest considerations. The amendments... representative capacity, it was done with full power and authority to do so; (C) To the best of his knowledge...
76 FR 64803 - Rules of Adjudication and Enforcement
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-19
...) is also amended to clarify the limits on discovery when the Commission orders the ALJ to consider the... that the complainant identify, to the best of its knowledge, the ``like or directly competitive... the taking of discovery by the parties shall be at the discretion of the presiding ALJ. The ITCTLA...
78 FR 63253 - Davidson Kempner Capital Management LLC; Notice of Application
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-23
... employees of the Adviser other than the Contributor have any knowledge of the Contribution prior to its discovery by the Adviser on November 2, 2011. The Contribution was discovered by the Adviser's compliance... names of employees. After discovery of the Contribution, the Adviser and Contributor obtained the...
Application of statistical mining in healthcare data management for allergic diseases
NASA Astrophysics Data System (ADS)
Wawrzyniak, Zbigniew M.; Martínez Santolaya, Sara
2014-11-01
The paper aims to discuss data mining techniques based on statistical tools in medical data management in case of long-term diseases. The data collected from a population survey is the source for reasoning and identifying disease processes responsible for patient's illness and its symptoms, and prescribing a knowledge and decisions in course of action to correct patient's condition. The case considered as a sample of constructive approach to data management is a dependence of allergic diseases of chronic nature on some symptoms and environmental conditions. The knowledge summarized in a systematic way as accumulated experience constitutes to an experiential simplified model of the diseases with feature space constructed of small set of indicators. We have presented the model of disease-symptom-opinion with knowledge discovery for data management in healthcare. The feature is evident that the model is purely data-driven to evaluate the knowledge of the diseases` processes and probability dependence of future disease events on symptoms and other attributes. The example done from the outcomes of the survey of long-term (chronic) disease shows that a small set of core indicators as 4 or more symptoms and opinions could be very helpful in reflecting health status change over disease causes. Furthermore, the data driven understanding of the mechanisms of diseases gives physicians the basis for choices of treatment what outlines the need of data governance in this research domain of discovered knowledge from surveys.
Bench-to-bedside review: Future novel diagnostics for sepsis - a systems biology approach
2013-01-01
The early, accurate diagnosis and risk stratification of sepsis remains an important challenge in the critically ill. Since traditional biomarker strategies have not yielded a gold standard marker for sepsis, focus is shifting towards novel strategies that improve assessment capabilities. The combination of technological advancements and information generated through the human genome project positions systems biology at the forefront of biomarker discovery. While previously available, developments in the technologies focusing on DNA, gene expression, gene regulatory mechanisms, protein and metabolite discovery have made these tools more feasible to implement and less costly, and they have taken on an enhanced capacity such that they are ripe for utilization as tools to advance our knowledge and clinical research. Medicine is in a genome-level era that can leverage the assessment of thousands of molecular signals beyond simply measuring selected circulating proteins. Genomics is the study of the entire complement of genetic material of an individual. Epigenetics is the regulation of gene activity by reversible modifications of the DNA. Transcriptomics is the quantification of the relative levels of messenger RNA for a large number of genes in specific cells or tissues to measure differences in the expression levels of different genes, and the utilization of patterns of differential gene expression to characterize different biological states of a tissue. Proteomics is the large-scale study of proteins. Metabolomics is the study of the small molecule profiles that are the terminal downstream products of the genome and consists of the total complement of all low-molecular-weight molecules that cellular processes leave behind. Taken together, these individual fields of study may be linked during a systems biology approach. There remains a valuable opportunity to deploy these technologies further in human research. The techniques described in this paper not only have the potential to increase the spectrum of diagnostic and prognostic biomarkers in sepsis, but they may also enable the discovery of new disease pathways. This may in turn lead us to improved therapeutic targets. The objective of this paper is to provide an overview and basic framework for clinicians and clinical researchers to better understand the 'omics technologies' to enhance further use of these valuable tools. PMID:24093155
A Cybernetic Design Methodology for 'Intelligent' Online Learning Support
NASA Astrophysics Data System (ADS)
Quinton, Stephen R.
The World Wide Web (WWW) provides learners and knowledge workers convenient access to vast stores of information, so much that present methods for refinement of a query or search result are inadequate - there is far too much potentially useful material. The problem often encountered is that users usually do not recognise what may be useful until they have progressed some way through the discovery, learning, and knowledge acquisition process. Additional support is needed to structure and identify potentially relevant information, and to provide constructive feedback. In short, support for learning is needed. The learning envisioned here is not simply the capacity to recall facts or to recognise objects. The focus is on learning that results in the construction of knowledge. Although most online learning platforms are efficient at delivering information, most do not provide tools that support learning as envisaged in this chapter. It is conceivable that Web-based learning environments can incorporate software systems that assist learners to form new associations between concepts and synthesise information to create new knowledge. This chapter details the rationale and theory behind a research study that aims to evolve Web-based learning environments into 'intelligent thinking' systems that respond to natural language human input. Rather than functioning simply as a means of delivering information, it is argued that online learning solutions will 1 day interact directly with students to support their conceptual thinking and cognitive development.
Zhang, Guang Lan; Riemer, Angelika B.; Keskin, Derin B.; Chitkushev, Lou; Reinherz, Ellis L.; Brusic, Vladimir
2014-01-01
High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological data on HPV available through publications, technical reports and databases. These data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological data using data mining techniques remains a challenging task. To support integration of data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from data mining of scientific literature and databases. The data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this database as putative targets. HPVdb is a knowledge-based system that integrates curated data and information with tailored analysis tools to facilitate data mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/ PMID:24705205
Zhang, Guang Lan; Riemer, Angelika B; Keskin, Derin B; Chitkushev, Lou; Reinherz, Ellis L; Brusic, Vladimir
2014-01-01
High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological data on HPV available through publications, technical reports and databases. These data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological data using data mining techniques remains a challenging task. To support integration of data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from data mining of scientific literature and databases. The data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this database as putative targets. HPVdb is a knowledge-based system that integrates curated data and information with tailored analysis tools to facilitate data mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/.
2007-09-10
KENNEDY SPACE CENTER, FLA. -- In bay 3 of the Orbiter Processing Facility, a tool storage assembly unit is being moved for storage in Discovery's payload bay. The tools may be used on a spacewalk, yet to be determined, during mission STS-120. In an unusual operation, the payload bay doors had to be reopened after closure to accommodate the storage. Space shuttle Discovery is targeted to launch Oct. 23 to the International Space Station. It will carry the U.S. Node 2, a connecting module, named Harmony, for assembly on the space station. Photo credit: NASA/Amanda Diller
Network-based approaches to climate knowledge discovery
NASA Astrophysics Data System (ADS)
Budich, Reinhard; Nyberg, Per; Weigel, Tobias
2011-11-01
Climate Knowledge Discovery Workshop; Hamburg, Germany, 30 March to 1 April 2011 Do complex networks combined with semantic Web technologies offer the next generation of solutions in climate science? To address this question, a first Climate Knowledge Discovery (CKD) Workshop, hosted by the German Climate Computing Center (Deutsches Klimarechenzentrum (DKRZ)), brought together climate and computer scientists from major American and European laboratories, data centers, and universities, as well as representatives from industry, the broader academic community, and the semantic Web communities. The participants, representing six countries, were concerned with large-scale Earth system modeling and computational data analysis. The motivation for the meeting was the growing problem that climate scientists generate data faster than it can be interpreted and the need to prepare for further exponential data increases. Current analysis approaches are focused primarily on traditional methods, which are best suited for large-scale phenomena and coarse-resolution data sets. The workshop focused on the open discussion of ideas and technologies to provide the next generation of solutions to cope with the increasing data volumes in climate science.
Mott, Meghan; Koroshetz, Walter
2015-07-01
The mission of the National Institute of Neurological Disorders and Stroke (NINDS) is to seek fundamental knowledge about the brain and nervous system and to use that knowledge to reduce the burden of neurological disease. NINDS supports early- and late-stage therapy development funding programs to accelerate preclinical discovery and the development of new therapeutic interventions for neurological disorders. The NINDS Office of Translational Research facilitates and funds the movement of discoveries from the laboratory to patients. Its grantees include academics, often with partnerships with the private sector, as well as small businesses, which, by Congressional mandate, receive > 3% of the NINDS budget for small business innovation research. This article provides an overview of NINDS-funded therapy development programs offered by the NINDS Office of Translational Research.
Full, Robert J; Dudley, Robert; Koehl, M A R; Libby, Thomas; Schwab, Cheryl
2015-11-01
Experiencing the thrill of an original scientific discovery can be transformative to students unsure about becoming a scientist, yet few courses offer authentic research experiences. Increasingly, cutting-edge discoveries require an interdisciplinary approach not offered in current departmental-based courses. Here, we describe a one-semester, learning laboratory course on organismal biomechanics offered at our large research university that enables interdisciplinary teams of students from biology and engineering to grow intellectually, collaborate effectively, and make original discoveries. To attain this goal, we avoid traditional "cookbook" laboratories by training 20 students to use a dozen research stations. Teams of five students rotate to a new station each week where a professor, graduate student, and/or team member assists in the use of equipment, guides students through stages of critical thinking, encourages interdisciplinary collaboration, and moves them toward authentic discovery. Weekly discussion sections that involve the entire class offer exchange of discipline-specific knowledge, advice on experimental design, methods of collecting and analyzing data, a statistics primer, and best practices for writing and presenting scientific papers. The building of skills in concert with weekly guided inquiry facilitates original discovery via a final research project that can be presented at a national meeting or published in a scientific journal. © The Author 2015. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
DiscoverySpace: an interactive data analysis application
Robertson, Neil; Oveisi-Fordorei, Mehrdad; Zuyderduyn, Scott D; Varhol, Richard J; Fjell, Christopher; Marra, Marco; Jones, Steven; Siddiqui, Asim
2007-01-01
DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. PMID:17210078
Balancing the risks and the benefits.
Klopack
2000-04-01
Pharmaceutical research organizations can benefit from outsourcing discovery activities that are not core competencies of the organization. The core competencies for a discovery operation are the expertise and systems that give the organization an advantage over its competition. Successful outsourcing ventures result in cost reduction, increased operation efficiency and optimization of resource allocation. While there are pitfalls to outsourcing, including poor partner selection and inadequate implementation, outsourcing can be a powerful tool for enhancing drug discovery operations.
Quantum mechanics implementation in drug-design workflows: does it really help?
Arodola, Olayide A; Soliman, Mahmoud Es
2017-01-01
The pharmaceutical industry is progressively operating in an era where development costs are constantly under pressure, higher percentages of drugs are demanded, and the drug-discovery process is a trial-and-error run. The profit that flows in with the discovery of new drugs has always been the motivation for the industry to keep up the pace and keep abreast with the endless demand for medicines. The process of finding a molecule that binds to the target protein using in silico tools has made computational chemistry a valuable tool in drug discovery in both academic research and pharmaceutical industry. However, the complexity of many protein-ligand interactions challenges the accuracy and efficiency of the commonly used empirical methods. The usefulness of quantum mechanics (QM) in drug-protein interaction cannot be overemphasized; however, this approach has little significance in some empirical methods. In this review, we discuss recent developments in, and application of, QM to medically relevant biomolecules. We critically discuss the different types of QM-based methods and their proposed application to incorporating them into drug-design and -discovery workflows while trying to answer a critical question: are QM-based methods of real help in drug-design and -discovery research and industry?
Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N
2013-03-15
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
Tools for Observation: Art and the Scientific Process
NASA Astrophysics Data System (ADS)
Pettit, E. C.; Coryell-Martin, M.; Maisch, K.
2015-12-01
Art can support the scientific process during different phases of a scientific discovery. Art can help explain and extend the scientific concepts for the general public; in this way art is a powerful tool for communication. Art can aid the scientist in processing and interpreting the data towards an understanding of the concepts and processes; in this way art is powerful - if often subconscious - tool to inform the process of discovery. Less often acknowledged, art can help engage students and inspire scientists during the initial development of ideas, observations, and questions; in this way art is a powerful tool to develop scientific questions and hypotheses. When we use art as a tool for communication of scientific discoveries, it helps break down barriers and makes science concepts less intimidating and more accessible and understandable for the learner. Scientists themselves use artistic concepts and processes - directly or indirectly - to help deepen their understanding. Teachers are following suit by using art more to stimulate students' creative thinking and problem solving. We show the value of teaching students to use the artistic "way of seeing" to develop their skills in observation, questioning, and critical thinking. In this way, art can be a powerful tool to engage students (from elementary to graduate) in the beginning phase of a scientific discovery, which is catalyzed by inquiry and curiosity. Through qualitative assessment of the Girls on Ice program, we show that many of the specific techniques taught by art teachers are valuable for science students to develop their observation skills. In particular, the concepts of contour drawing, squinting, gesture drawing, inverted drawing, and others can provide valuable training for student scientists. These art techniques encourage students to let go of preconceptions and "see" the world (the "data") in new ways they help students focus on both large-scale patterns and small-scale details.
Asymmetric threat data mining and knowledge discovery
NASA Astrophysics Data System (ADS)
Gilmore, John F.; Pagels, Michael A.; Palk, Justin
2001-03-01
Asymmetric threats differ from the conventional force-on- force military encounters that the Defense Department has historically been trained to engage. Terrorism by its nature is now an operational activity that is neither easily detected or countered as its very existence depends on small covert attacks exploiting the element of surprise. But terrorism does have defined forms, motivations, tactics and organizational structure. Exploiting a terrorism taxonomy provides the opportunity to discover and assess knowledge of terrorist operations. This paper describes the Asymmetric Threat Terrorist Assessment, Countering, and Knowledge (ATTACK) system. ATTACK has been developed to (a) data mine open source intelligence (OSINT) information from web-based newspaper sources, video news web casts, and actual terrorist web sites, (b) evaluate this information against a terrorism taxonomy, (c) exploit country/region specific social, economic, political, and religious knowledge, and (d) discover and predict potential terrorist activities and association links. Details of the asymmetric threat structure and the ATTACK system architecture are presented with results of an actual terrorist data mining and knowledge discovery test case shown.
Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V
2011-01-01
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.
Big data management challenges in health research-a literature review.
Wang, Xiaoming; Williams, Carolyn; Liu, Zhen Hua; Croghan, Joe
2017-08-07
Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a defining challenge in biomedical informatics. While essential to create a foundation for knowledge discovery, optimized solutions to deliver high-quality and easy-to-use information resources are not thoroughly explored. In this review, we identify the gaps between current data management approaches and the need for new capacity to manage big data generated in advanced health research. Focusing on these unmet needs and well-recognized problems, we introduce state-of-the-art concepts, approaches and technologies for data management from computing academia and industry to explore improvement solutions. We explain the potential and significance of these advances for biomedical informatics. In addition, we discuss specific issues that have a great impact on technical solutions for developing the next generation of digital products (tools and data) to facilitate the raw-data-to-knowledge process in health research. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Large-Scale Discovery of Disease-Disease and Disease-Gene Associations
Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran
2016-01-01
Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529
Oak Ridge Graph Analytics for Medical Innovation (ORiGAMI)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, Larry W.; Lee, Sangkeun
2016-01-01
In this era of data-driven decisions and discovery where Big Data is producing Bigger Data, data scientists at the Oak Ridge National Laboratory are leveraging unique leadership infrastructure (e.g., Urika XA and Urika GD appliances) to develop scalable algorithms for semantic, logical and statistical reasoning with Big Data (i.e., data stored in databases as well as unstructured data in documents). ORiGAMI is a next-generation knowledge-discovery framework that is: (a) knowledge nurturing (i.e., evolves seamlessly with newer knowledge and data), (b) smart and curious (i.e. using information-foraging and reasoning algorithms to digest content) and (c) synergistic (i.e., interfaces computers with whatmore » they do best to help subject-matter-experts do their best. ORiGAMI has been demonstrated using the National Library of Medicine's SEMANTIC MEDLINE (archive of medical knowledge since 1994).« less
nRC: non-coding RNA Classifier based on structural features.
Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso
2017-01-01
Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
PubMed and beyond: a survey of web tools for searching biomedical literature
Lu, Zhiyong
2011-01-01
The past decade has witnessed the modern advances of high-throughput technology and rapid growth of research capacity in producing large-scale biological data, both of which were concomitant with an exponential growth of biomedical literature. This wealth of scholarly knowledge is of significant importance for researchers in making scientific discoveries and healthcare professionals in managing health-related matters. However, the acquisition of such information is becoming increasingly difficult due to its large volume and rapid growth. In response, the National Center for Biotechnology Information (NCBI) is continuously making changes to its PubMed Web service for improvement. Meanwhile, different entities have devoted themselves to developing Web tools for helping users quickly and efficiently search and retrieve relevant publications. These practices, together with maturity in the field of text mining, have led to an increase in the number and quality of various Web tools that provide comparable literature search service to PubMed. In this study, we review 28 such tools, highlight their respective innovations, compare them to the PubMed system and one another, and discuss directions for future development. Furthermore, we have built a website dedicated to tracking existing systems and future advances in the field of biomedical literature search. Taken together, our work serves information seekers in choosing tools for their needs and service providers and developers in keeping current in the field. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search PMID:21245076
An automated framework for hypotheses generation using literature.
Abedi, Vida; Zand, Ramin; Yeasin, Mohammed; Faisal, Fazle Elahi
2012-08-29
In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds "crisp semantic associations" among entities of interest - that is a step towards bridging such gaps. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect "crisp" associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Pilot studies were performed using two diseases. A comparative analysis of the computed "associations" and "assertions" with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture "crisp" direct and indirect associations, and provide knowledge discovery on demand. The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
RSAT: regulatory sequence analysis tools.
Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques
2008-07-01
The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
NASA Technical Reports Server (NTRS)
Casas, Joseph
2017-01-01
Within the IARPC Collaboration Team activities of 2016, Arctic in-situ and remote earth observations advanced topics such as :1) exploring the role for new and innovative autonomous observing technologies in the Arctic; 2) advancing catalytic national and international community based observing efforts in support of the National Strategy for the Arctic Region; and 3) enhancing the use of discovery tools for observing system collaboration such as the U.S. National Oceanic and Atmospheric Administration (NOAA) Arctic Environmental Response Management Application (ERMA) and the U.S. National Aeronautics and Space Administration (NASA) Arctic Collaborative Environment (ACE) project geo reference visualization decision support and exploitation internet based tools. Critical to the success of these earth observations for both in-situ and remote systems is the emerging of new and innovative data collection technologies and comprehensive modeling as well as enhanced communications and cyber infrastructure capabilities which effectively assimilate and dissemination many environmental intelligence products in a timely manner. The Arctic Collaborative Environment (ACE) project is well positioned to greatly enhance user capabilities for accessing, organizing, visualizing, sharing and producing collaborative knowledge for the Arctic.
Web-based Tool Suite for Plasmasphere Information Discovery
NASA Astrophysics Data System (ADS)
Newman, T. S.; Wang, C.; Gallagher, D. L.
2005-12-01
A suite of tools that enable discovery of terrestrial plasmasphere characteristics from NASA IMAGE Extreme Ultra Violet (EUV) images is described. The tool suite is web-accessible, allowing easy remote access without the need for any software installation on the user's computer. The features supported by the tool include reconstruction of the plasmasphere plasma density distribution from a short sequence of EUV images, semi-automated selection of the plasmapause boundary in an EUV image, and mapping of the selected boundary to the geomagnetic equatorial plane. EUV image upload and result download is also supported. The tool suite's plasmapause mapping feature is achieved via the Roelof and Skinner (2000) Edge Algorithm. The plasma density reconstruction is achieved through a tomographic technique that exploits physical constraints to allow for a moderate resolution result. The tool suite's software architecture uses Java Server Pages (JSP) and Java Applets on the front side for user-software interaction and Java Servlets on the server side for task execution. The compute-intensive components of the tool suite are implemented in C++ and invoked by the server via Java Native Interface (JNI).
Improve Data Mining and Knowledge Discovery Through the Use of MatLab
NASA Technical Reports Server (NTRS)
Shaykhian, Gholam Ali; Martin, Dawn (Elliott); Beil, Robert
2011-01-01
Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(R) (MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.
Improve Data Mining and Knowledge Discovery through the use of MatLab
NASA Technical Reports Server (NTRS)
Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert
2011-01-01
Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and its enormous availability of built in functionalities and toolboxes make it suitable to perform numerical computations and simulations as well as a data mining tool. Engineers and scientists can take advantage of the readily available functions/toolboxes to gain wider insight in their perspective data mining experiments.
New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases
NASA Astrophysics Data System (ADS)
Brescia, Massimo
2012-11-01
Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.
García-Alonso, Carlos; Pérez-Naranjo, Leonor
2009-01-01
Introduction Knowledge management, based on information transfer between experts and analysts, is crucial for the validity and usability of data envelopment analysis (DEA). Aim To design and develop a methodology: i) to assess technical efficiency of small health areas (SHA) in an uncertainty environment, and ii) to transfer information between experts and operational models, in both directions, for improving expert’s knowledge. Method A procedure derived from knowledge discovery from data (KDD) is used to select, interpret and weigh DEA inputs and outputs. Based on KDD results, an expert-driven Monte-Carlo DEA model has been designed to assess the technical efficiency of SHA in Andalusia. Results In terms of probability, SHA 29 is the most efficient being, on the contrary, SHA 22 very inefficient. 73% of analysed SHA have a probability of being efficient (Pe) >0.9 and 18% <0.5. Conclusions Expert knowledge is necessary to design and validate any operational model. KDD techniques make the transfer of information from experts to any operational model easy and results obtained from the latter improve expert’s knowledge.
RNA interference for functional genomics and improvement of cotton (Gossypium species)
USDA-ARS?s Scientific Manuscript database
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium ssp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function ...
The metagenomic approach and causality in virology
Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico
2015-01-01
Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566
Standardized Plant Disease Evaluations will Enhance Resistance Gene Discovery
USDA-ARS?s Scientific Manuscript database
Gene discovery and marker development using DNA based tools require plant populations with well-documented phenotypes. Related crops such as apples and pears may share a number of genes, for example resistance to common diseases, and data mining in one crop may reveal genes for the other. However, u...
Session #1: Exploration and Discovery through Maps: Teaching Science with Technology (elementary school) - EnviroAtlas is a tool developed by the U.S. Environmental Protection Agency and its partners that empowers anyone with the internet to be a highly informed local decision-ma...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vixie, Kevin R.
This is the final report for the project "Geometric Analysis for Data Reduction and Structure Discovery" in which insights and tools from geometric analysis were developed and exploited for their potential to large scale data challenges.
Impact of computational structure-based methods on drug discovery.
Reynolds, Charles H
2014-01-01
Structure-based drug design has become an indispensible tool in drug discovery. The emergence of structure-based design is due to gains in structural biology that have provided exponential growth in the number of protein crystal structures, new computational algorithms and approaches for modeling protein-ligand interactions, and the tremendous growth of raw computer power in the last 30 years. Computer modeling and simulation have made major contributions to the discovery of many groundbreaking drugs in recent years. Examples are presented that highlight the evolution of computational structure-based design methodology, and the impact of that methodology on drug discovery.
18 CFR 385.403 - Methods of discovery; general provisions (Rule 403).
Code of Federal Regulations, 2010 CFR
2010-04-01
... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Methods of discovery; general provisions (Rule 403). 385.403 Section 385.403 Conservation of Power and Water Resources FEDERAL... the response is true and accurate to the best of that person's knowledge, information, and belief...
ERIC Educational Resources Information Center
Heffernan, Bernadette M.
1998-01-01
Describes work done to provide staff of the Sandy Point Discovery Center with methods for evaluating exhibits and interpretive programming. Quantitative and qualitative evaluation measures were designed to assess the program's objective of estuary education. Pretest-posttest questionnaires and interviews are used to measure subjects' knowledge and…
The Prehistory of Discovery: Precursors of Representational Change in Solving Gear System Problems.
ERIC Educational Resources Information Center
Dixon, James A.; Bangert, Ashley S.
2002-01-01
This study investigated whether the process of representational change undergoes developmental change or different processes occupy different niches in the course of knowledge acquisition. Subjects--college, third-, and sixth-grade students--solved gear system problems over two sessions. Findings indicated that for all grades, discovery of the…
40 CFR 300.300 - Phase I-Discovery or notification.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 27 2010-07-01 2010-07-01 false Phase I-Discovery or notification. 300.300 Section 300.300 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) SUPERFUND... person in charge of a vessel or a facility shall, as soon as he or she has knowledge of any discharge...
The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology
Gardner, Eugene J.; Lam, Vincent K.; Harris, Daniel N.; Chuang, Nelson T.; Scott, Emma C.; Pittard, W. Stephen; Mills, Ryan E.; Devine, Scott E.
2017-01-01
Mobile element insertions (MEIs) represent ∼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings. PMID:28855259
Information Discovery and Retrieval Tools
2004-12-01
information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
Information Discovery and Retrieval Tools
2003-04-01
information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. PMID:25003610
An Ensemble Approach to Building Mercer Kernels with Prior Information
NASA Technical Reports Server (NTRS)
Srivastava, Ashok N.; Schumann, Johann; Fischer, Bernd
2005-01-01
This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly dimensional feature space. we describe a new method called Mixture Density Mercer Kernels to learn kernel function directly from data, rather than using pre-defined kernels. These data adaptive kernels can encode prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. Specifically, we demonstrate the use of the algorithm in situations with extremely small samples of data. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS) and demonstrate the method's superior performance against standard methods. The code for these experiments has been generated with the AUTOBAYES tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains templates for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic-algebraic computations, which allows AUTOBAYES to find closed-form solutions and, where possible, to integrate them into the code.
Khajouei, Hamid; Khajouei, Reza
2017-12-01
Appropriate knowledge, correct information, and relevant data are vital in medical diagnosis and treatment systems. Knowledge Management (KM) through its tools/techniques provides a pertinent framework for decision-making in healthcare systems. The objective of this study was to identify and prioritize the KM tools/techniques that apply to hospital setting. This is a descriptive-survey study. Data were collected using a -researcher-made questionnaire that was developed based on experts' opinions to select the appropriate tools/techniques from 26 tools/techniques of the Asian Productivity Organization (APO) model. Questions were categorized into five steps of KM (identifying, creating, storing, sharing, and applying the knowledge) according to this model. The study population consisted of middle and senior managers of hospitals and managing directors of Vice-Chancellor for Curative Affairs in Kerman University of Medical Sciences in Kerman, Iran. The data were analyzed in SPSS v.19 using one-sample t-test. Twelve out of 26 tools/techniques of the APO model were identified as the tools applicable in hospitals. "Knowledge café" and "APO knowledge management assessment tool" with respective means of 4.23 and 3.7 were the most and the least applicable tools in the knowledge identification step. "Mentor-mentee scheme", as well as "voice and Voice over Internet Protocol (VOIP)" with respective means of 4.20 and 3.52 were the most and the least applicable tools/techniques in the knowledge creation step. "Knowledge café" and "voice and VOIP" with respective means of 3.85 and 3.42 were the most and the least applicable tools/techniques in the knowledge storage step. "Peer assist and 'voice and VOIP' with respective means of 4.14 and 3.38 were the most and the least applicable tools/techniques in the knowledge sharing step. Finally, "knowledge worker competency plan" and "knowledge portal" with respective means of 4.38 and 3.85 were the most and the least applicable tools/techniques in the knowledge application step. The results showed that 12 out of 26 tools in the APO model are appropriate for hospitals of which 11 are significantly applicable, and "storytelling" is marginally applicable. In this study, the preferred tools/techniques for implementation of each of the five KM steps in hospitals are introduced. Copyright © 2017 Elsevier B.V. All rights reserved.
Lesourd, Mathieu; Budriesi, Carla; Osiurak, François; Nichelli, Paolo F; Bartolo, Angela
2017-12-20
In the literature on apraxia of tool use, it is now accepted that using familiar tools requires semantic and mechanical knowledge. However, mechanical knowledge is nearly always assessed with production tasks, so one may assume that mechanical knowledge and familiar tool use are associated only because of their common motor mechanisms. This notion may be challenged by demonstrating that familiar tool use depends on an alternative tool selection task assessing mechanical knowledge, where alternative uses of tools are assumed according to their physical properties but where actual use of tools is not needed. We tested 21 left brain-damaged patients and 21 matched controls with familiar tool use tasks (pantomime and single tool use), semantic tasks and an alternative tool selection task. The alternative tool selection task accounted for a large amount of variance in the single tool use task and was the best predictor among all the semantic tasks. Concerning the pantomime of tool use task, group and individual results suggested that the integrity of the semantic system and preserved mechanical knowledge are neither necessary nor sufficient to produce pantomimes. These results corroborate the idea that mechanical knowledge is essential when we use tools, even when tasks assessing mechanical knowledge do not require the production of any motor action. Our results also confirm the value of pantomime of tool use, which can be considered as a complex activity involving several cognitive abilities (e.g., communicative skills) rather than the activation of gesture engrams. © 2017 The British Psychological Society.
Mouse Models for Drug Discovery. Can New Tools and Technology Improve Translational Power?
Zuberi, Aamir; Lutz, Cathleen
2016-12-01
The use of mouse models in biomedical research and preclinical drug evaluation is on the rise. The advent of new molecular genome-altering technologies such as CRISPR/Cas9 allows for genetic mutations to be introduced into the germ line of a mouse faster and less expensively than previous methods. In addition, the rapid progress in the development and use of somatic transgenesis using viral vectors, as well as manipulations of gene expression with siRNAs and antisense oligonucleotides, allow for even greater exploration into genomics and systems biology. These technological advances come at a time when cost reductions in genome sequencing have led to the identification of pathogenic mutations in patient populations, providing unprecedented opportunities in the use of mice to model human disease. The ease of genetic engineering in mice also offers a potential paradigm shift in resource sharing and the speed by which models are made available in the public domain. Predictively, the knowledge alone that a model can be quickly remade will provide relief to resources encumbered by licensing and Material Transfer Agreements. For decades, mouse strains have provided an exquisite experimental tool to study the pathophysiology of the disease and assess therapeutic options in a genetically defined system. However, a major limitation of the mouse has been the limited genetic diversity associated with common laboratory mice. This has been overcome with the recent development of the Collaborative Cross and Diversity Outbred mice. These strains provide new tools capable of replicating genetic diversity to that approaching the diversity found in human populations. The Collaborative Cross and Diversity Outbred strains thus provide a means to observe and characterize toxicity or efficacy of new therapeutic drugs for a given population. The combination of traditional and contemporary mouse genome editing tools, along with the addition of genetic diversity in new modeling systems, are synergistic and serve to make the mouse a better model for biomedical research, enhancing the potential for preclinical drug discovery and personalized medicine. © The Author 2016. Published by Oxford University Press.
Serendipity: Accidental Discoveries in Science
NASA Astrophysics Data System (ADS)
Roberts, Royston M.
1989-06-01
Many of the things discovered by accident are important in our everyday lives: Teflon, Velcro, nylon, x-rays, penicillin, safety glass, sugar substitutes, and polyethylene and other plastics. And we owe a debt to accident for some of our deepest scientific knowledge, including Newton's theory of gravitation, the Big Bang theory of Creation, and the discovery of DNA. Even the Rosetta Stone, the Dead Sea Scrolls, and the ruins of Pompeii came to light through chance. This book tells the fascinating stories of these and other discoveries and reveals how the inquisitive human mind turns accident into discovery. Written for the layman, yet scientifically accurate, this illuminating collection of anecdotes portrays invention and discovery as quintessentially human acts, due in part to curiosity, perserverance, and luck.
Closed-Loop Multitarget Optimization for Discovery of New Emulsion Polymerization Recipes
2015-01-01
Self-optimization of chemical reactions enables faster optimization of reaction conditions or discovery of molecules with required target properties. The technology of self-optimization has been expanded to discovery of new process recipes for manufacture of complex functional products. A new machine-learning algorithm, specifically designed for multiobjective target optimization with an explicit aim to minimize the number of “expensive” experiments, guides the discovery process. This “black-box” approach assumes no a priori knowledge of chemical system and hence particularly suited to rapid development of processes to manufacture specialist low-volume, high-value products. The approach was demonstrated in discovery of process recipes for a semibatch emulsion copolymerization, targeting a specific particle size and full conversion. PMID:26435638
Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer
2015-01-01
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
Application of bioinformatics tools and databases in microbial dehalogenation research (a review).
Satpathy, R; Konkimalla, V B; Ratha, J
2015-01-01
Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.
Genome Editing: A New Approach to Human Therapeutics.
Porteus, Matthew
2016-01-01
The ability to manipulate the genome with precise spatial and nucleotide resolution (genome editing) has been a powerful research tool. In the past decade, the tools and expertise for using genome editing in human somatic cells and pluripotent cells have increased to such an extent that the approach is now being developed widely as a strategy to treat human disease. The fundamental process depends on creating a site-specific DNA double-strand break (DSB) in the genome and then allowing the cell's endogenous DSB repair machinery to fix the break such that precise nucleotide changes are made to the DNA sequence. With the development and discovery of several different nuclease platforms and increasing knowledge of the parameters affecting different genome editing outcomes, genome editing frequencies now reach therapeutic relevance for a wide variety of diseases. Moreover, there is a series of complementary approaches to assessing the safety and toxicity of any genome editing process, irrespective of the underlying nuclease used. Finally, the development of genome editing has raised the issue of whether it should be used to engineer the human germline. Although such an approach could clearly prevent the birth of people with devastating and destructive genetic diseases, questions remain about whether human society is morally responsible enough to use this tool.
NASA Astrophysics Data System (ADS)
Ganzert, Steven; Guttmann, Josef; Steinmann, Daniel; Kramer, Stefan
Lung protective ventilation strategies reduce the risk of ventilator associated lung injury. To develop such strategies, knowledge about mechanical properties of the mechanically ventilated human lung is essential. This study was designed to develop an equation discovery system to identify mathematical models of the respiratory system in time-series data obtained from mechanically ventilated patients. Two techniques were combined: (i) the usage of declarative bias to reduce search space complexity and inherently providing the processing of background knowledge. (ii) A newly developed heuristic for traversing the hypothesis space with a greedy, randomized strategy analogical to the GSAT algorithm. In 96.8% of all runs the applied equation discovery system was capable to detect the well-established equation of motion model of the respiratory system in the provided data. We see the potential of this semi-automatic approach to detect more complex mathematical descriptions of the respiratory system from respiratory data.
100 years of elementary particles [Beam Line, vol. 27, issue 1, Spring 1997
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais, Abraham; Weinberg, Steven; Quigg, Chris
1997-04-01
This issue of Beam Line commemorates the 100th anniversary of the April 30, 1897 report of the discovery of the electron by J.J. Thomson and the ensuing discovery of other subatomic particles. In the first three articles, theorists Abraham Pais, Steven Weinberg, and Chris Quigg provide their perspectives on the discoveries of elementary particles as well as the implications and future directions resulting from these discoveries. In the following three articles, Michael Riordan, Wolfgang Panofsky, and Virginia Trimble apply our knowledge about elementary particles to high-energy research, electronics technology, and understanding the origin and evolution of our Universe.
100 years of Elementary Particles [Beam Line, vol. 27, issue 1, Spring 1997
DOE R&D Accomplishments Database
Pais, Abraham; Weinberg, Steven; Quigg, Chris; Riordan, Michael; Panofsky, Wolfgang K. H.; Trimble, Virginia
1997-04-01
This issue of Beam Line commemorates the 100th anniversary of the April 30, 1897 report of the discovery of the electron by J.J. Thomson and the ensuing discovery of other subatomic particles. In the first three articles, theorists Abraham Pais, Steven Weinberg, and Chris Quigg provide their perspectives on the discoveries of elementary particles as well as the implications and future directions resulting from these discoveries. In the following three articles, Michael Riordan, Wolfgang Panofsky, and Virginia Trimble apply our knowledge about elementary particles to high-energy research, electronics technology, and understanding the origin and evolution of our Universe.
Computational medicinal chemistry in fragment-based drug discovery: what, how and when.
Rabal, Obdulia; Urbano-Cuadrado, Manuel; Oyarzabal, Julen
2011-01-01
The use of fragment-based drug discovery (FBDD) has increased in the last decade due to the encouraging results obtained to date. In this scenario, computational approaches, together with experimental information, play an important role to guide and speed up the process. By default, FBDD is generally considered as a constructive approach. However, such additive behavior is not always present, therefore, simple fragment maturation will not always deliver the expected results. In this review, computational approaches utilized in FBDD are reported together with real case studies, where applicability domains are exemplified, in order to analyze them, and then, maximize their performance and reliability. Thus, a proper use of these computational tools can minimize misleading conclusions, keeping the credit on FBDD strategy, as well as achieve higher impact in the drug-discovery process. FBDD goes one step beyond a simple constructive approach. A broad set of computational tools: docking, R group quantitative structure-activity relationship, fragmentation tools, fragments management tools, patents analysis and fragment-hopping, for example, can be utilized in FBDD, providing a clear positive impact if they are utilized in the proper scenario - what, how and when. An initial assessment of additive/non-additive behavior is a critical point to define the most convenient approach for fragments elaboration.
MRMaid, the web-based tool for designing multiple reaction monitoring (MRM) transitions.
Mead, Jennifer A; Bianco, Luca; Ottone, Vanessa; Barton, Chris; Kay, Richard G; Lilley, Kathryn S; Bond, Nicholas J; Bessant, Conrad
2009-04-01
Multiple reaction monitoring (MRM) of peptides uses tandem mass spectrometry to quantify selected proteins of interest, such as those previously identified in differential studies. Using this technique, the specificity of precursor to product transitions is harnessed for quantitative analysis of multiple proteins in a single sample. The design of transitions is critical for the success of MRM experiments, but predicting signal intensity of peptides and fragmentation patterns ab initio is challenging given existing methods. The tool presented here, MRMaid (pronounced "mermaid") offers a novel alternative for rapid design of MRM transitions for the proteomics researcher. The program uses a combination of knowledge of the properties of optimal MRM transitions taken from expert practitioners and literature with MS/MS evidence derived from interrogation of a database of peptide identifications and their associated mass spectra. The tool also predicts retention time using a published model, allowing ordering of transition candidates. By exploiting available knowledge and resources to generate the most reliable transitions, this approach negates the need for theoretical prediction of fragmentation and the need to undertake prior "discovery" MS studies. MRMaid is a modular tool built around the Genome Annotating Proteomic Pipeline framework, providing a web-based solution with both descriptive and graphical visualizations of transitions. Predicted transition candidates are ranked based on a novel transition scoring system, and users may filter the results by selecting optional stringency criteria, such as omitting frequently modified residues, constraining the length of peptides, or omitting missed cleavages. Comparison with published transitions showed that MRMaid successfully predicted the peptide and product ion pairs in the majority of cases with appropriate retention time estimates. As the data content of the Genome Annotating Proteomic Pipeline repository increases, the coverage and reliability of MRMaid are set to increase further. MRMaid is freely available over the internet as an executable web-based service at www.mrmaid.info.
The Requirements and Design of the Rapid Prototyping Capabilities System
NASA Astrophysics Data System (ADS)
Haupt, T. A.; Moorhead, R.; O'Hara, C.; Anantharaj, V.
2006-12-01
The Rapid Prototyping Capabilities (RPC) system will provide the capability to rapidly evaluate innovative methods of linking science observations. To this end, the RPC will provide the capability to integrate the software components and tools needed to evaluate the use of a wide variety of current and future NASA sensors, numerical models, and research results, model outputs, and knowledge, collectively referred to as "resources". It is assumed that the resources are geographically distributed, and thus RPC will provide the support for the location transparency of the resources. The RPC system requires providing support for: (1) discovery, semantic understanding, secure access and transport mechanisms for data products available from the known data provides; (2) data assimilation and geo- processing tools for all data transformations needed to match given data products to the model input requirements; (3) model management including catalogs of models and model metadata, and mechanisms for creation environments for model execution; and (4) tools for model output analysis and model benchmarking. The challenge involves developing a cyberinfrastructure for a coordinated aggregate of software, hardware and other technologies, necessary to facilitate RPC experiments, as well as human expertise to provide an integrated, "end-to-end" platform to support the RPC objectives. Such aggregation is to be achieved through a horizontal integration of loosely coupled services. The cyberinfrastructure comprises several software layers. At the bottom, the Grid fabric encompasses network protocols, optical networks, computational resources, storage devices, and sensors. At the top, applications use workload managers to coordinate their access to physical resources. Applications are not tightly bounded to a single physical resource. Instead, they bind dynamically to resources (i.e., they are provisioned) via a common grid infrastructure layer. For the RPC system, the cyberinfrastructure must support organizing computations (or "data transformations" in general) into complex workflows with resource discovery, automatic resource allocation, monitoring, preserving provenance as well as to aggregate heterogeneous, distributed data into knowledge databases. Such service orchestration is the responsibility of the "collective services" layer. For RPC, this layer will be based on Java Business Integration (JBI, [JSR-208]) specification which is a standards-based integration platform that combines messaging, web services, data transformation, and intelligent routing to reliably connect and coordinate the interaction of significant numbers of diverse applications (plug-in components) across organizational boundaries. JBI concept is a new approach to integration that can provide the underpinnings for loosely coupled, highly distributed integration network that can scale beyond the limits of currently used hub-and-spoke brokers. This presentation discusses the requirements, design and early prototype of the NASA-sponsored RPC system under development at Mississippi State University, demonstrating the integration of data provisioning mechanisms, data transformation tools and computational models into a single interoperable system enabling rapid execution of RPC experiments.
Computer applications making rapid advances in high throughput microbial proteomics (HTMP).
Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen
2014-02-01
The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.
Development of a knowledge acquisition tool for an expert system flight status monitor
NASA Technical Reports Server (NTRS)
Disbrow, J. D.; Duke, E. L.; Regenie, V. A.
1986-01-01
Two of the main issues in artificial intelligence today are knowledge acquisition dion and knowledge representation. The Dryden Flight Research Facility of NASA's Ames Research Center is presently involved in the design and implementation of an expert system flight status monitor that will provide expertise and knowledge to aid the flight systems engineer in monitoring today's advanced high-performance aircraft. The flight status monitor can be divided into two sections: the expert system itself and the knowledge acquisition tool. The knowledge acquisition tool, the means it uses to extract knowledge from the domain expert, and how that knowledge is represented for computer use is discussed. An actual aircraft system has been codified by this tool with great success. Future real-time use of the expert system has been facilitated by using the knowledge acquisition tool to easily generate a logically consistent and complete knowledge base.